Re: [HACKERS] Hot standby doesn't come up on some situation.

2014-03-07 Thread Kyotaro HORIGUCHI
Hello,

  After all, I have confirmed that this fixes the problem on crash
  recovery of hot-standby botfor 9.3 and HEAD and no problem was
  found except unreadability :(
 
 Ok, committed. Thanks!

Thank you.

 Any concrete suggestions about the readability? Is there some
 particular spot that needs clarifying?

Well, Although I became to see no problem there after I understood how
it works:-), I'll write about two points where I had difficulties to
understand.

|  * (or the checkpoint record itself, if it's a shutdown checkpoint).
|  */
| if (checkPoint.redo  RecPtr)

First, it was a bit tough work to confirm the equivalence between
(redo == RecPtr) and that the checkpoint is shutdown checkpoint.
Although finally I was convinced that it surely holds, that is
actually not the point. The point here is in the first half of the
phrase. The comment might be less perplexing if it were as folowing
even if only shutdown checkpoint satisfies the condition. But it would
occur another quiestion in readers' mind.

|  * (or the checkpoint record itself, e.g. if it's a shutdown checkpoint).

Second, the added code depends on the assumption that RecPtr points to
the checkpoint record and EndRecPtr points to the next record there.
It would be better for understandability and stability (against
modifying code) to explicitly declare the precondition, like this

| Here RecPtr points the checkpoint record and EndRecPtr points to the
| place for the record just after.


  Surely this is the consequence of illegal operation but I think
  it is also not a issue of assertion - which fires on something
  wrong in design or quite rare cases(this case ?).
 
 Ah, I see. Yes, that's definitely a bug. If you don't hit the
 assertion, because the oldestActiveXID is set in the checkpoint record
 even though wal_level is 'archive', or if you simply have assertions
 disabled, the system will start up in hot standby mode even though
 it's not safe.
 
  So it might be
  better to show message as below on the case.
 
  | FATAL:  Checkpoint doesn't have valid oldest active transaction id
  | HINT: Reading WAL might have been written under insufficient
  | wal_level.

Agreed.

 Hmm. When I test that with 9.2, oldestActiveXID is not 0, even though
 wal_level is 'archive'. So the above patch doesn't fix the whole
 problem.
 
 The real bug here is that CheckRequiredParameterValues() tests for
 InArchiveRecovery, when it should be testing for
 ArchiveRecoveryRequested. Otherwise, the checks are not performed when
 going through the crash recovery followed by archive recovery. I
 should've changed that as part of the commit that added the crash
 recovery then archive recovery behavior.
 
 Fixed, thanks for pointing it out!

It's my pleasre.

regrds,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby doesn't come up on some situation.

2014-03-05 Thread Kyotaro HORIGUCHI
Hello, 

After all, I have confirmed that this fixes the problem on crash
recovery of hot-standby botfor 9.3 and HEAD and no problem was
found except unreadability :(

By the way, I moderately want to fix an assertion message to a
ordinary one. Details are below.


The server stops with following message during restarting after
crash requesting archive recovery when the WAL has been produced
with the wal_level below WAL_LEVEL_HOT_STANDBY.

| TRAP: FailedAssertion(!(((oldestActiveXID) != ((TransactionId) 0))), File: 
xlog.c, Line: 6799)
| LOG:  startup process (PID 7270) was terminated by signal 6: Aborted

Surely this is the consequence of illegal operation but I think
it is also not a issue of assertion - which fires on something
wrong in design or quite rare cases(this case ?). So it might be
better to show message as below on the case.

| FATAL:  Checkpoint doesn't have valid oldest active transaction id
| HINT:  Reading WAL might have been written under insufficient wal_level.

This could do in this way,

==
diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index e3d5e10..bb6922a 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6789,7 +6789,13 @@ StartupXLOG(void)
if (wasShutdown)
oldestActiveXID = 
PrescanPreparedTransactions(xids, nxids);
else
+   {
oldestActiveXID = checkPoint.oldestActiveXid;
+   if (!TransactionIdIsValid(oldestActiveXID))
+   ereport(FATAL,
+   (errmsg(Checkpoint 
doesn't have valid oldest active transaction id),
+errhint(Reading WAL 
might have been written under insufficient wal_level.)));
+   }
Assert(TransactionIdIsValid(oldestActiveXID));
 
/* Tell procarray about the range of xids it has to 
deal with */
=


What do you think about this? Feel free dumping this if you feel
negative on this.


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby doesn't come up on some situation.

2014-03-05 Thread Heikki Linnakangas

On 03/05/2014 10:51 AM, Kyotaro HORIGUCHI wrote:

Hello,

After all, I have confirmed that this fixes the problem on crash
recovery of hot-standby botfor 9.3 and HEAD and no problem was
found except unreadability :(


Ok, committed. Thanks!

Any concrete suggestions about the readability? Is there some particular 
spot that needs clarifying?



By the way, I moderately want to fix an assertion message to a
ordinary one. Details are below.


The server stops with following message during restarting after
crash requesting archive recovery when the WAL has been produced
with the wal_level below WAL_LEVEL_HOT_STANDBY.

| TRAP: FailedAssertion(!(((oldestActiveXID) != ((TransactionId) 0))), File: 
xlog.c, Line: 6799)
| LOG:  startup process (PID 7270) was terminated by signal 6: Aborted

Surely this is the consequence of illegal operation but I think
it is also not a issue of assertion - which fires on something
wrong in design or quite rare cases(this case ?).


Ah, I see. Yes, that's definitely a bug. If you don't hit the assertion, 
because the oldestActiveXID is set in the checkpoint record even though 
wal_level is 'archive', or if you simply have assertions disabled, the 
system will start up in hot standby mode even though it's not safe.



So it might be
better to show message as below on the case.

| FATAL:  Checkpoint doesn't have valid oldest active transaction id
| HINT:  Reading WAL might have been written under insufficient wal_level.

This could do in this way,
==
diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index e3d5e10..bb6922a 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6789,7 +6789,13 @@ StartupXLOG(void)
if (wasShutdown)
oldestActiveXID = 
PrescanPreparedTransactions(xids, nxids);
else
+   {
oldestActiveXID = checkPoint.oldestActiveXid;
+   if (!TransactionIdIsValid(oldestActiveXID))
+   ereport(FATAL,
+   (errmsg(Checkpoint doesn't 
have valid oldest active transaction id),
+errhint(Reading WAL might 
have been written under insufficient wal_level.)));
+   }
Assert(TransactionIdIsValid(oldestActiveXID));

/* Tell procarray about the range of xids it has to 
deal with */
=


What do you think about this? Feel free dumping this if you feel
negative on this.


Hmm. When I test that with 9.2, oldestActiveXID is not 0, even though 
wal_level is 'archive'. So the above patch doesn't fix the whole problem.


The real bug here is that CheckRequiredParameterValues() tests for 
InArchiveRecovery, when it should be testing for 
ArchiveRecoveryRequested. Otherwise, the checks are not performed when 
going through the crash recovery followed by archive recovery. I 
should've changed that as part of the commit that added the crash 
recovery then archive recovery behavior.


Fixed, thanks for pointing it out!

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby doesn't come up on some situation.

2014-03-02 Thread Kyotaro HORIGUCHI
Hello,

At Fri, 28 Feb 2014 14:45:58 +0200, Heikki Linnakangas 
hlinnakan...@vmware.com wrote in 53108506.2010...@vmware.com
  Yes, but the same stuation could be made by restarting crashed
  secondary.
 
 Yeah.
 
  I have no idea about the scenario on whitch this behavior was regarded
  as
  undesirable but anyway I think that the secondry should start
  accepting
  client just after crash recovery is completed.
 
 Agreed, this is a bug.
 
 I don't think your patch is the right fix for this though. Setting
 minRecoveryPoint to EndRecPtr is the right thing to do; EndRecPtr
 points to the end of the last read and replayed record. What's wrong
 in this case is lastReplayedEndRecptr. At the beginning of recovery,
 it's initialized to the REDO point, but with a shutdown checkpoint,
 that's not quite right. When starting from a shutdown checkpoint, REDO
 points to the beginning of the shutdown record, but we've already
 effectively replayed it. The next record we replay is the one after
 the checkpoint.

It's more reasonable. I felt uncelar about that but I forgot to
doubt the correctness of lastReplayedEndRecptr then, but surely
the shutdown record itself was effectively alredy replayed when
the recored is inseretd.

 To see that, I added some elog(LOG) calls:
 
 ~/pgsql.93stable$ bin/postmaster -D data
 LOG:  database system was shut down at 2014-02-28 14:06:18 EET
 LOG:  ReadCheckpointRecord: 0/16479C98
 LOG:  database system is ready to accept connections
 LOG:  autovacuum launcher started
 ^CLOG:  received fast shutdown request
 LOG:  aborting any active transactions
 LOG:  autovacuum launcher shutting down
 LOG:  shutting down
 LOG: INSERT @ 0/16479D00: prev 0/16479C98; xid 0; len 72: XLOG -
 checkpoint: redo 0/16479D00; tli 1; prev tli 1; fpw true; xid
 0/793393; oid 24988; multi 655288; offset 1356722; oldest xid 687 in
 DB 1; oldest multi 1 in DB 1; oldest running xid 0; shutdown
 LOG:  xlog flush request 0/16479D68; write 0/0; flush 0/0
 LOG:  database system is shut down
 ~/pgsql.93stable$ bin/postmaster -D data
 LOG:  database system was shut down at 2014-02-28 14:06:23 EET
 LOG:  ReadCheckpointRecord: 0/16479D00
 LOG:  database system is ready to accept connections
 LOG:  autovacuum launcher started
 Killed
 
 At this point, the last record is the shutdown checkpoint, beginning
 at 16479D00, and the server has been killed (immediate shutdown).
 
 ~/pgsql.93stable$ cp recovery.conf data/recovery.conf
 ~/pgsql.93stable$ bin/postmaster -D data
 LOG: database system was interrupted; last known up at 2014-02-28
 14:06:29 EET
 LOG:  entering standby mode
 LOG:  ReadCheckpointRecord: 0/16479D00
 LOG: database system was not properly shut down; automatic recovery in
 progress
 LOG:  record with zero length at 0/16479D68
 LOG:  reached end of WAL in pg_xlog, entering archive recovery
 LOG:  EndRecPtr: 0/16479D68 lastReplayedEndRecPtr: 0/16479D00
 FATAL: could not connect to the primary server: could not connect to
 server: Connection refused
 ...
 
 Recovery starts from the checkpoint record, but lastReplayedEndRecPtr
 is set to the *beginning* of the checkpoint record, even though the
 checkpoint record has already been effectively replayed, by the feat
 of starting recovery from it. EndRecPtr correctly points to the end of
 the checkpoint record. Because of the incorrect lastReplayedEndRecPtr
 value, the CheckRecoveryConsistency() call concludes that it's not
 consistent.

I completely understood the behavior thanks to your detailed
explanation. (And how to use log messages effectively :-)

I agree that the fix is appropriate.

 I believe the attached fix is the right way to fix this.

It also worked for me. Thank you.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby doesn't come up on some situation.

2014-03-02 Thread Kyotaro HORIGUCHI
Ouch! It brought another bug.

 I completely understood the behavior thanks to your detailed
 explanation. (And how to use log messages effectively :-)

Sorry, I just found that it's wrong, and found another problem
brought by your patch.

 I agree that the fix is appropriate.
 
  I believe the attached fix is the right way to fix this.
 
 It also worked for me. Thank you.

| * as if we had just replayed the record before the REDO location
| * (or the checkpoint record itself, if it's a shutdown checkpoint).

The test script following raises assertion failure. It's added
with 'non-shutdown' checkpoint' just before shutting down
immediately. Starting server aborts with the following message.

| LOG:  database system was not properly shut down; automatic recovery in 
progress
| TRAP: FailedAssertion(!(((oldestActiveXID) != ((TransactionId) 0))), File: 
xlog.c, Line: 6771)
| LOG:  startup process (PID 28561) was terminated by signal 6: Aborted

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

===
#! /bin/sh

killall postgres
rm -rf $PGDATA/*
initdb
pg_ctl start -w
sleep 1
psql postgres -c 'checkpoint'
pg_ctl stop -m i
cat  $PGDATA/recovery.conf EOF
standby_mode = 'on'
primary_conninfo = 'host=localhost port= user=repuser application_name=pm01 
keepalives_idle=60 keepalives_interval=5 keepalives_count=5'
#restore_command = '/bin/true'
recovery_target_timeline = 'latest'
EOF
cat  $PGDATA/postgresql.conf EOF
#log_min_messages = debug5
hot_standby = on
EOF
pg_ctl start



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby doesn't come up on some situation.

2014-03-02 Thread Kyotaro HORIGUCHI
Correcting one point of my last mail.

 Ouch! It brought another bug.

My patch also did.

regards,


  I completely understood the behavior thanks to your detailed
  explanation. (And how to use log messages effectively :-)
 
 Sorry, I just found that it's wrong, and found another problem
 brought by your patch.
 
  I agree that the fix is appropriate.
  
   I believe the attached fix is the right way to fix this.
  
  It also worked for me. Thank you.
 
 | * as if we had just replayed the record before the REDO location
 | * (or the checkpoint record itself, if it's a shutdown checkpoint).
 
 The test script following raises assertion failure. It's added
 with 'non-shutdown' checkpoint' just before shutting down
 immediately. Starting server aborts with the following message.
 
 | LOG:  database system was not properly shut down; automatic recovery in 
 progress
 | TRAP: FailedAssertion(!(((oldestActiveXID) != ((TransactionId) 0))), 
 File: xlog.c, Line: 6771)
 | LOG:  startup process (PID 28561) was terminated by signal 6: Aborted
 
 regards,
 
 -- 
 Kyotaro Horiguchi
 NTT Open Source Software Center
 
 ===
 #! /bin/sh
 
 killall postgres
 rm -rf $PGDATA/*
 initdb
 pg_ctl start -w
 sleep 1
 psql postgres -c 'checkpoint'
 pg_ctl stop -m i
 cat  $PGDATA/recovery.conf EOF
 standby_mode = 'on'
 primary_conninfo = 'host=localhost port= user=repuser 
 application_name=pm01 keepalives_idle=60 keepalives_interval=5 
 keepalives_count=5'
 #restore_command = '/bin/true'
 recovery_target_timeline = 'latest'
 EOF
 cat  $PGDATA/postgresql.conf EOF
 #log_min_messages = debug5
 hot_standby = on
 EOF
 pg_ctl start
 

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby doesn't come up on some situation.

2014-03-02 Thread Kyotaro HORIGUCHI
Hello,

 | * as if we had just replayed the record before the REDO location
 | * (or the checkpoint record itself, if it's a shutdown checkpoint).
 
 The test script following raises assertion failure. It's added
 with 'non-shutdown' checkpoint' just before shutting down
 immediately. Starting server aborts with the following message.
 
 | LOG:  database system was not properly shut down; automatic recovery in 
 progress
 | TRAP: FailedAssertion(!(((oldestActiveXID) != ((TransactionId) 0))), 
 File: xlog.c, Line: 6771)
 | LOG:  startup process (PID 28561) was terminated by signal 6: Aborted

This is because the checkpoint was done with 'wal_level =
minimal'. The server restarts correctly by starting the server
with 'wal_level = hot_standby' at first.

It looks a mere mis-op. The log message looks unfriendly but I'm
uncertain of the necessity of changing it.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Hot standby doesn't come up on some situation.

2014-02-28 Thread Kyotaro HORIGUCHI
Hello, we found that hot standby doesn't came up under certain
condition. This occurs for 9.3 and 9.4dev.

The recovery process stays on 'incosistent' state forever when
the server has crashed before any wal record is inserted after
the last checkpoint.

This seems to be because EndRecPtr is set to minRecoveryPoint at
the end of crash recovery in ReadRecord. EndRecPtr here points to
the beginning of the next record to the record alread read, just
after the last checkpoint and no record is there in this
case. Then successive CheckRecoveryConsistency won't consider
that the 'consistent state' comes in spite that actually it is
already consistent.

I diffidently think that lastReplayedEndRecPtr is suitable there.

The script attached first causes the situation. Run it, then
after the server complains that it can't connect to the primary,
connecting it by psql results in,

| psql: FATAL:  the database system is starting up

The attached patch fixes the problem on 9.4dev.

What do you think about this?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
#! /bin/sh

# killall postgres
# rm -rf $PGDATA/*
initdb
pg_ctl start -w
sleep 1
pg_ctl stop -m i
cat  $PGDATA/recovery.conf EOF
standby_mode = 'on'
primary_conninfo = 'host=localhost port= user=repuser application_name=pm01 
keepalives_idle=60 keepalives_interval=5 keepalives_count=5'
#restore_command = '/bin/true'
recovery_target_timeline = 'latest'
EOF
cat  $PGDATA/postgresql.conf EOF
#log_min_messages = debug5
hot_standby = on
EOF
pg_ctl start
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 483d5c3..f1f54f1 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -4496,7 +4496,15 @@ ReadRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int emode,
 ControlFile-state = DB_IN_ARCHIVE_RECOVERY;
 if (ControlFile-minRecoveryPoint  EndRecPtr)
 {
-	ControlFile-minRecoveryPoint = EndRecPtr;
+	/*
+	 * Altough EndRecPtr is the right value for
+	 * minRecoveryPoint in archive recovery, it is a bit too
+	 * far when the last checkpoint record is the last wal
+	 * record here. Use lastReplayedEndRecPtr as
+	 * minRecoveryPoint point to start hot stanby just after.
+	 */
+	ControlFile-minRecoveryPoint =
+		XLogCtl-lastReplayedEndRecPtr;
 	ControlFile-minRecoveryPointTLI = ThisTimeLineID;
 }
 /* update local copy */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby doesn't come up on some situation.

2014-02-28 Thread Kyotaro HORIGUCHI
Ouch. this is the same issue to the mail below,

http://www.postgresql.org/message-id/53104595.6060...@lab.ntt.co.jp

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby doesn't come up on some situation.

2014-02-28 Thread Andres Freund
On 2014-02-28 17:55:21 +0900, Kyotaro HORIGUCHI wrote:
 The recovery process stays on 'incosistent' state forever when
 the server has crashed before any wal record is inserted after
 the last checkpoint.

 # killall postgres
 # rm -rf $PGDATA/*
 initdb
 pg_ctl start -w
 sleep 1
 pg_ctl stop -m i
 cat  $PGDATA/recovery.conf EOF
 standby_mode = 'on'
 primary_conninfo = 'host=localhost port= user=repuser 
 application_name=pm01 keepalives_idle=60 keepalives_interval=5 
 keepalives_count=5'
 #restore_command = '/bin/true'
 recovery_target_timeline = 'latest'
 EOF
 cat  $PGDATA/postgresql.conf EOF
 #log_min_messages = debug5
 hot_standby = on
 EOF
 pg_ctl start

Uh. So, if I understand correctly, what you did is to convert a normal
live pg, into a replica that doesn't have a upstream node, right?
Normally the primary will just do an additional write shortly
afterwards, resolving this situation?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby doesn't come up on some situation.

2014-02-28 Thread Kyotaro HORIGUCHI
Hello,

2014/02/28 18:07 Andres Freund :

 On 2014-02-28 17:55:21 +0900, Kyotaro HORIGUCHI wrote:
  The recovery process stays on 'incosistent' state forever when
  the server has crashed before any wal record is inserted after
  the last checkpoint.

  # killall postgres
  # rm -rf $PGDATA/*
  initdb
  pg_ctl start -w
  sleep 1
  pg_ctl stop -m i
  cat  $PGDATA/recovery.conf EOF
  standby_mode = 'on'
  primary_conninfo = 'host=localhost port= user=repuser
application_name=pm01 keepalives_idle=60 keepalives_interval=5
keepalives_count=5'
  #restore_command = '/bin/true'
  recovery_target_timeline = 'latest'
  EOF
  cat  $PGDATA/postgresql.conf EOF
  #log_min_messages = debug5
  hot_standby = on
  EOF
  pg_ctl start

 Uh. So, if I understand correctly, what you did is to convert a normal
 live pg, into a replica that doesn't have a upstream node, right?

Yes, but the same stuation could be made by restarting crashed secondary. I
didn't tried it since my console is far behind now...

Can a restart of secondry after crashing just after the completion of
restartpoint with no further records make the same situation?

I have no idea about the scenario on whitch this behavior was regarded as
undesirable but anyway I think that the secondry should start accepting
client just after crash recovery is completed.

 Normally the primary will just do an additional write shortly
 afterwards, resolving this situation?

Maybe so. I haven't tried though.

regards,
-- 
Kyotaro Horiguchi
NTT Opensource Software Center


Re: [HACKERS] Hot standby doesn't come up on some situation.

2014-02-28 Thread Heikki Linnakangas

On 02/28/2014 11:51 AM, Kyotaro HORIGUCHI wrote:

Hello,

2014/02/28 18:07 Andres Freund :


On 2014-02-28 17:55:21 +0900, Kyotaro HORIGUCHI wrote:

The recovery process stays on 'incosistent' state forever when
the server has crashed before any wal record is inserted after
the last checkpoint.



# killall postgres
# rm -rf $PGDATA/*
initdb
pg_ctl start -w
sleep 1
pg_ctl stop -m i
cat  $PGDATA/recovery.conf EOF
standby_mode = 'on'
primary_conninfo = 'host=localhost port= user=repuser

application_name=pm01 keepalives_idle=60 keepalives_interval=5
keepalives_count=5'

#restore_command = '/bin/true'
recovery_target_timeline = 'latest'
EOF
cat  $PGDATA/postgresql.conf EOF
#log_min_messages = debug5
hot_standby = on
EOF
pg_ctl start


Uh. So, if I understand correctly, what you did is to convert a normal
live pg, into a replica that doesn't have a upstream node, right?


Yes, but the same stuation could be made by restarting crashed secondary.


Yeah.


I have no idea about the scenario on whitch this behavior was regarded as
undesirable but anyway I think that the secondry should start accepting
client just after crash recovery is completed.


Agreed, this is a bug.

I don't think your patch is the right fix for this though. Setting 
minRecoveryPoint to EndRecPtr is the right thing to do; EndRecPtr points 
to the end of the last read and replayed record. What's wrong in this 
case is lastReplayedEndRecptr. At the beginning of recovery, it's 
initialized to the REDO point, but with a shutdown checkpoint, that's 
not quite right. When starting from a shutdown checkpoint, REDO points 
to the beginning of the shutdown record, but we've already effectively 
replayed it. The next record we replay is the one after the checkpoint.


To see that, I added some elog(LOG) calls:

~/pgsql.93stable$ bin/postmaster -D data
LOG:  database system was shut down at 2014-02-28 14:06:18 EET
LOG:  ReadCheckpointRecord: 0/16479C98
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started
^CLOG:  received fast shutdown request
LOG:  aborting any active transactions
LOG:  autovacuum launcher shutting down
LOG:  shutting down
LOG:  INSERT @ 0/16479D00: prev 0/16479C98; xid 0; len 72: XLOG - 
checkpoint: redo 0/16479D00; tli 1; prev tli 1; fpw true; xid 0/793393; 
oid 24988; multi 655288; offset 1356722; oldest xid 687 in DB 1; oldest 
multi 1 in DB 1; oldest running xid 0; shutdown

LOG:  xlog flush request 0/16479D68; write 0/0; flush 0/0
LOG:  database system is shut down
~/pgsql.93stable$ bin/postmaster -D data
LOG:  database system was shut down at 2014-02-28 14:06:23 EET
LOG:  ReadCheckpointRecord: 0/16479D00
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started
Killed

At this point, the last record is the shutdown checkpoint, beginning at 
16479D00, and the server has been killed (immediate shutdown).


~/pgsql.93stable$ cp recovery.conf data/recovery.conf
~/pgsql.93stable$ bin/postmaster -D data
LOG:  database system was interrupted; last known up at 2014-02-28 
14:06:29 EET

LOG:  entering standby mode
LOG:  ReadCheckpointRecord: 0/16479D00
LOG:  database system was not properly shut down; automatic recovery in 
progress

LOG:  record with zero length at 0/16479D68
LOG:  reached end of WAL in pg_xlog, entering archive recovery
LOG:  EndRecPtr: 0/16479D68 lastReplayedEndRecPtr: 0/16479D00
FATAL:  could not connect to the primary server: could not connect to 
server: Connection refused

...

Recovery starts from the checkpoint record, but lastReplayedEndRecPtr is 
set to the *beginning* of the checkpoint record, even though the 
checkpoint record has already been effectively replayed, by the feat of 
starting recovery from it. EndRecPtr correctly points to the end of the 
checkpoint record. Because of the incorrect lastReplayedEndRecPtr value, 
the CheckRecoveryConsistency() call concludes that it's not consistent.


I believe the attached fix is the right way to fix this.

- Heikki
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 7399fd4..a36354f 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -5457,13 +5457,17 @@ StartupXLOG(void)
 
 		/*
 		 * Initialize shared variables for tracking progress of WAL replay,
-		 * as if we had just replayed the record before the REDO location.
+		 * as if we had just replayed the record before the REDO location
+		 * (or the checkpoint record itself, if it's a shutdown checkpoint).
 		 */
 		SpinLockAcquire(xlogctl-info_lck);
-		xlogctl-replayEndRecPtr = checkPoint.redo;
+		if (checkPoint.redo  RecPtr)
+			xlogctl-replayEndRecPtr = checkPoint.redo;
+		else
+			xlogctl-replayEndRecPtr = EndRecPtr;
 		xlogctl-replayEndTLI = ThisTimeLineID;
-		xlogctl-lastReplayedEndRecPtr = checkPoint.redo;
-		xlogctl-lastReplayedTLI = ThisTimeLineID;
+		xlogctl-lastReplayedEndRecPtr = xlogctl-replayEndRecPtr;
+		xlogctl-lastReplayedTLI = xlogctl-replayEndTLI;