Re: [HACKERS] Hot standby doesn't come up on some situation.
Hello, After all, I have confirmed that this fixes the problem on crash recovery of hot-standby botfor 9.3 and HEAD and no problem was found except unreadability :( Ok, committed. Thanks! Thank you. Any concrete suggestions about the readability? Is there some particular spot that needs clarifying? Well, Although I became to see no problem there after I understood how it works:-), I'll write about two points where I had difficulties to understand. | * (or the checkpoint record itself, if it's a shutdown checkpoint). | */ | if (checkPoint.redo RecPtr) First, it was a bit tough work to confirm the equivalence between (redo == RecPtr) and that the checkpoint is shutdown checkpoint. Although finally I was convinced that it surely holds, that is actually not the point. The point here is in the first half of the phrase. The comment might be less perplexing if it were as folowing even if only shutdown checkpoint satisfies the condition. But it would occur another quiestion in readers' mind. | * (or the checkpoint record itself, e.g. if it's a shutdown checkpoint). Second, the added code depends on the assumption that RecPtr points to the checkpoint record and EndRecPtr points to the next record there. It would be better for understandability and stability (against modifying code) to explicitly declare the precondition, like this | Here RecPtr points the checkpoint record and EndRecPtr points to the | place for the record just after. Surely this is the consequence of illegal operation but I think it is also not a issue of assertion - which fires on something wrong in design or quite rare cases(this case ?). Ah, I see. Yes, that's definitely a bug. If you don't hit the assertion, because the oldestActiveXID is set in the checkpoint record even though wal_level is 'archive', or if you simply have assertions disabled, the system will start up in hot standby mode even though it's not safe. So it might be better to show message as below on the case. | FATAL: Checkpoint doesn't have valid oldest active transaction id | HINT: Reading WAL might have been written under insufficient | wal_level. Agreed. Hmm. When I test that with 9.2, oldestActiveXID is not 0, even though wal_level is 'archive'. So the above patch doesn't fix the whole problem. The real bug here is that CheckRequiredParameterValues() tests for InArchiveRecovery, when it should be testing for ArchiveRecoveryRequested. Otherwise, the checks are not performed when going through the crash recovery followed by archive recovery. I should've changed that as part of the commit that added the crash recovery then archive recovery behavior. Fixed, thanks for pointing it out! It's my pleasre. regrds, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot standby doesn't come up on some situation.
Hello, After all, I have confirmed that this fixes the problem on crash recovery of hot-standby botfor 9.3 and HEAD and no problem was found except unreadability :( By the way, I moderately want to fix an assertion message to a ordinary one. Details are below. The server stops with following message during restarting after crash requesting archive recovery when the WAL has been produced with the wal_level below WAL_LEVEL_HOT_STANDBY. | TRAP: FailedAssertion(!(((oldestActiveXID) != ((TransactionId) 0))), File: xlog.c, Line: 6799) | LOG: startup process (PID 7270) was terminated by signal 6: Aborted Surely this is the consequence of illegal operation but I think it is also not a issue of assertion - which fires on something wrong in design or quite rare cases(this case ?). So it might be better to show message as below on the case. | FATAL: Checkpoint doesn't have valid oldest active transaction id | HINT: Reading WAL might have been written under insufficient wal_level. This could do in this way, == diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index e3d5e10..bb6922a 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -6789,7 +6789,13 @@ StartupXLOG(void) if (wasShutdown) oldestActiveXID = PrescanPreparedTransactions(xids, nxids); else + { oldestActiveXID = checkPoint.oldestActiveXid; + if (!TransactionIdIsValid(oldestActiveXID)) + ereport(FATAL, + (errmsg(Checkpoint doesn't have valid oldest active transaction id), +errhint(Reading WAL might have been written under insufficient wal_level.))); + } Assert(TransactionIdIsValid(oldestActiveXID)); /* Tell procarray about the range of xids it has to deal with */ = What do you think about this? Feel free dumping this if you feel negative on this. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot standby doesn't come up on some situation.
On 03/05/2014 10:51 AM, Kyotaro HORIGUCHI wrote: Hello, After all, I have confirmed that this fixes the problem on crash recovery of hot-standby botfor 9.3 and HEAD and no problem was found except unreadability :( Ok, committed. Thanks! Any concrete suggestions about the readability? Is there some particular spot that needs clarifying? By the way, I moderately want to fix an assertion message to a ordinary one. Details are below. The server stops with following message during restarting after crash requesting archive recovery when the WAL has been produced with the wal_level below WAL_LEVEL_HOT_STANDBY. | TRAP: FailedAssertion(!(((oldestActiveXID) != ((TransactionId) 0))), File: xlog.c, Line: 6799) | LOG: startup process (PID 7270) was terminated by signal 6: Aborted Surely this is the consequence of illegal operation but I think it is also not a issue of assertion - which fires on something wrong in design or quite rare cases(this case ?). Ah, I see. Yes, that's definitely a bug. If you don't hit the assertion, because the oldestActiveXID is set in the checkpoint record even though wal_level is 'archive', or if you simply have assertions disabled, the system will start up in hot standby mode even though it's not safe. So it might be better to show message as below on the case. | FATAL: Checkpoint doesn't have valid oldest active transaction id | HINT: Reading WAL might have been written under insufficient wal_level. This could do in this way, == diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index e3d5e10..bb6922a 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -6789,7 +6789,13 @@ StartupXLOG(void) if (wasShutdown) oldestActiveXID = PrescanPreparedTransactions(xids, nxids); else + { oldestActiveXID = checkPoint.oldestActiveXid; + if (!TransactionIdIsValid(oldestActiveXID)) + ereport(FATAL, + (errmsg(Checkpoint doesn't have valid oldest active transaction id), +errhint(Reading WAL might have been written under insufficient wal_level.))); + } Assert(TransactionIdIsValid(oldestActiveXID)); /* Tell procarray about the range of xids it has to deal with */ = What do you think about this? Feel free dumping this if you feel negative on this. Hmm. When I test that with 9.2, oldestActiveXID is not 0, even though wal_level is 'archive'. So the above patch doesn't fix the whole problem. The real bug here is that CheckRequiredParameterValues() tests for InArchiveRecovery, when it should be testing for ArchiveRecoveryRequested. Otherwise, the checks are not performed when going through the crash recovery followed by archive recovery. I should've changed that as part of the commit that added the crash recovery then archive recovery behavior. Fixed, thanks for pointing it out! - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot standby doesn't come up on some situation.
Hello, At Fri, 28 Feb 2014 14:45:58 +0200, Heikki Linnakangas hlinnakan...@vmware.com wrote in 53108506.2010...@vmware.com Yes, but the same stuation could be made by restarting crashed secondary. Yeah. I have no idea about the scenario on whitch this behavior was regarded as undesirable but anyway I think that the secondry should start accepting client just after crash recovery is completed. Agreed, this is a bug. I don't think your patch is the right fix for this though. Setting minRecoveryPoint to EndRecPtr is the right thing to do; EndRecPtr points to the end of the last read and replayed record. What's wrong in this case is lastReplayedEndRecptr. At the beginning of recovery, it's initialized to the REDO point, but with a shutdown checkpoint, that's not quite right. When starting from a shutdown checkpoint, REDO points to the beginning of the shutdown record, but we've already effectively replayed it. The next record we replay is the one after the checkpoint. It's more reasonable. I felt uncelar about that but I forgot to doubt the correctness of lastReplayedEndRecptr then, but surely the shutdown record itself was effectively alredy replayed when the recored is inseretd. To see that, I added some elog(LOG) calls: ~/pgsql.93stable$ bin/postmaster -D data LOG: database system was shut down at 2014-02-28 14:06:18 EET LOG: ReadCheckpointRecord: 0/16479C98 LOG: database system is ready to accept connections LOG: autovacuum launcher started ^CLOG: received fast shutdown request LOG: aborting any active transactions LOG: autovacuum launcher shutting down LOG: shutting down LOG: INSERT @ 0/16479D00: prev 0/16479C98; xid 0; len 72: XLOG - checkpoint: redo 0/16479D00; tli 1; prev tli 1; fpw true; xid 0/793393; oid 24988; multi 655288; offset 1356722; oldest xid 687 in DB 1; oldest multi 1 in DB 1; oldest running xid 0; shutdown LOG: xlog flush request 0/16479D68; write 0/0; flush 0/0 LOG: database system is shut down ~/pgsql.93stable$ bin/postmaster -D data LOG: database system was shut down at 2014-02-28 14:06:23 EET LOG: ReadCheckpointRecord: 0/16479D00 LOG: database system is ready to accept connections LOG: autovacuum launcher started Killed At this point, the last record is the shutdown checkpoint, beginning at 16479D00, and the server has been killed (immediate shutdown). ~/pgsql.93stable$ cp recovery.conf data/recovery.conf ~/pgsql.93stable$ bin/postmaster -D data LOG: database system was interrupted; last known up at 2014-02-28 14:06:29 EET LOG: entering standby mode LOG: ReadCheckpointRecord: 0/16479D00 LOG: database system was not properly shut down; automatic recovery in progress LOG: record with zero length at 0/16479D68 LOG: reached end of WAL in pg_xlog, entering archive recovery LOG: EndRecPtr: 0/16479D68 lastReplayedEndRecPtr: 0/16479D00 FATAL: could not connect to the primary server: could not connect to server: Connection refused ... Recovery starts from the checkpoint record, but lastReplayedEndRecPtr is set to the *beginning* of the checkpoint record, even though the checkpoint record has already been effectively replayed, by the feat of starting recovery from it. EndRecPtr correctly points to the end of the checkpoint record. Because of the incorrect lastReplayedEndRecPtr value, the CheckRecoveryConsistency() call concludes that it's not consistent. I completely understood the behavior thanks to your detailed explanation. (And how to use log messages effectively :-) I agree that the fix is appropriate. I believe the attached fix is the right way to fix this. It also worked for me. Thank you. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot standby doesn't come up on some situation.
Ouch! It brought another bug. I completely understood the behavior thanks to your detailed explanation. (And how to use log messages effectively :-) Sorry, I just found that it's wrong, and found another problem brought by your patch. I agree that the fix is appropriate. I believe the attached fix is the right way to fix this. It also worked for me. Thank you. | * as if we had just replayed the record before the REDO location | * (or the checkpoint record itself, if it's a shutdown checkpoint). The test script following raises assertion failure. It's added with 'non-shutdown' checkpoint' just before shutting down immediately. Starting server aborts with the following message. | LOG: database system was not properly shut down; automatic recovery in progress | TRAP: FailedAssertion(!(((oldestActiveXID) != ((TransactionId) 0))), File: xlog.c, Line: 6771) | LOG: startup process (PID 28561) was terminated by signal 6: Aborted regards, -- Kyotaro Horiguchi NTT Open Source Software Center === #! /bin/sh killall postgres rm -rf $PGDATA/* initdb pg_ctl start -w sleep 1 psql postgres -c 'checkpoint' pg_ctl stop -m i cat $PGDATA/recovery.conf EOF standby_mode = 'on' primary_conninfo = 'host=localhost port= user=repuser application_name=pm01 keepalives_idle=60 keepalives_interval=5 keepalives_count=5' #restore_command = '/bin/true' recovery_target_timeline = 'latest' EOF cat $PGDATA/postgresql.conf EOF #log_min_messages = debug5 hot_standby = on EOF pg_ctl start -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot standby doesn't come up on some situation.
Correcting one point of my last mail. Ouch! It brought another bug. My patch also did. regards, I completely understood the behavior thanks to your detailed explanation. (And how to use log messages effectively :-) Sorry, I just found that it's wrong, and found another problem brought by your patch. I agree that the fix is appropriate. I believe the attached fix is the right way to fix this. It also worked for me. Thank you. | * as if we had just replayed the record before the REDO location | * (or the checkpoint record itself, if it's a shutdown checkpoint). The test script following raises assertion failure. It's added with 'non-shutdown' checkpoint' just before shutting down immediately. Starting server aborts with the following message. | LOG: database system was not properly shut down; automatic recovery in progress | TRAP: FailedAssertion(!(((oldestActiveXID) != ((TransactionId) 0))), File: xlog.c, Line: 6771) | LOG: startup process (PID 28561) was terminated by signal 6: Aborted regards, -- Kyotaro Horiguchi NTT Open Source Software Center === #! /bin/sh killall postgres rm -rf $PGDATA/* initdb pg_ctl start -w sleep 1 psql postgres -c 'checkpoint' pg_ctl stop -m i cat $PGDATA/recovery.conf EOF standby_mode = 'on' primary_conninfo = 'host=localhost port= user=repuser application_name=pm01 keepalives_idle=60 keepalives_interval=5 keepalives_count=5' #restore_command = '/bin/true' recovery_target_timeline = 'latest' EOF cat $PGDATA/postgresql.conf EOF #log_min_messages = debug5 hot_standby = on EOF pg_ctl start -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot standby doesn't come up on some situation.
Hello, | * as if we had just replayed the record before the REDO location | * (or the checkpoint record itself, if it's a shutdown checkpoint). The test script following raises assertion failure. It's added with 'non-shutdown' checkpoint' just before shutting down immediately. Starting server aborts with the following message. | LOG: database system was not properly shut down; automatic recovery in progress | TRAP: FailedAssertion(!(((oldestActiveXID) != ((TransactionId) 0))), File: xlog.c, Line: 6771) | LOG: startup process (PID 28561) was terminated by signal 6: Aborted This is because the checkpoint was done with 'wal_level = minimal'. The server restarts correctly by starting the server with 'wal_level = hot_standby' at first. It looks a mere mis-op. The log message looks unfriendly but I'm uncertain of the necessity of changing it. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot standby doesn't come up on some situation.
Ouch. this is the same issue to the mail below, http://www.postgresql.org/message-id/53104595.6060...@lab.ntt.co.jp regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot standby doesn't come up on some situation.
On 2014-02-28 17:55:21 +0900, Kyotaro HORIGUCHI wrote: The recovery process stays on 'incosistent' state forever when the server has crashed before any wal record is inserted after the last checkpoint. # killall postgres # rm -rf $PGDATA/* initdb pg_ctl start -w sleep 1 pg_ctl stop -m i cat $PGDATA/recovery.conf EOF standby_mode = 'on' primary_conninfo = 'host=localhost port= user=repuser application_name=pm01 keepalives_idle=60 keepalives_interval=5 keepalives_count=5' #restore_command = '/bin/true' recovery_target_timeline = 'latest' EOF cat $PGDATA/postgresql.conf EOF #log_min_messages = debug5 hot_standby = on EOF pg_ctl start Uh. So, if I understand correctly, what you did is to convert a normal live pg, into a replica that doesn't have a upstream node, right? Normally the primary will just do an additional write shortly afterwards, resolving this situation? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot standby doesn't come up on some situation.
Hello, 2014/02/28 18:07 Andres Freund : On 2014-02-28 17:55:21 +0900, Kyotaro HORIGUCHI wrote: The recovery process stays on 'incosistent' state forever when the server has crashed before any wal record is inserted after the last checkpoint. # killall postgres # rm -rf $PGDATA/* initdb pg_ctl start -w sleep 1 pg_ctl stop -m i cat $PGDATA/recovery.conf EOF standby_mode = 'on' primary_conninfo = 'host=localhost port= user=repuser application_name=pm01 keepalives_idle=60 keepalives_interval=5 keepalives_count=5' #restore_command = '/bin/true' recovery_target_timeline = 'latest' EOF cat $PGDATA/postgresql.conf EOF #log_min_messages = debug5 hot_standby = on EOF pg_ctl start Uh. So, if I understand correctly, what you did is to convert a normal live pg, into a replica that doesn't have a upstream node, right? Yes, but the same stuation could be made by restarting crashed secondary. I didn't tried it since my console is far behind now... Can a restart of secondry after crashing just after the completion of restartpoint with no further records make the same situation? I have no idea about the scenario on whitch this behavior was regarded as undesirable but anyway I think that the secondry should start accepting client just after crash recovery is completed. Normally the primary will just do an additional write shortly afterwards, resolving this situation? Maybe so. I haven't tried though. regards, -- Kyotaro Horiguchi NTT Opensource Software Center
Re: [HACKERS] Hot standby doesn't come up on some situation.
On 02/28/2014 11:51 AM, Kyotaro HORIGUCHI wrote: Hello, 2014/02/28 18:07 Andres Freund : On 2014-02-28 17:55:21 +0900, Kyotaro HORIGUCHI wrote: The recovery process stays on 'incosistent' state forever when the server has crashed before any wal record is inserted after the last checkpoint. # killall postgres # rm -rf $PGDATA/* initdb pg_ctl start -w sleep 1 pg_ctl stop -m i cat $PGDATA/recovery.conf EOF standby_mode = 'on' primary_conninfo = 'host=localhost port= user=repuser application_name=pm01 keepalives_idle=60 keepalives_interval=5 keepalives_count=5' #restore_command = '/bin/true' recovery_target_timeline = 'latest' EOF cat $PGDATA/postgresql.conf EOF #log_min_messages = debug5 hot_standby = on EOF pg_ctl start Uh. So, if I understand correctly, what you did is to convert a normal live pg, into a replica that doesn't have a upstream node, right? Yes, but the same stuation could be made by restarting crashed secondary. Yeah. I have no idea about the scenario on whitch this behavior was regarded as undesirable but anyway I think that the secondry should start accepting client just after crash recovery is completed. Agreed, this is a bug. I don't think your patch is the right fix for this though. Setting minRecoveryPoint to EndRecPtr is the right thing to do; EndRecPtr points to the end of the last read and replayed record. What's wrong in this case is lastReplayedEndRecptr. At the beginning of recovery, it's initialized to the REDO point, but with a shutdown checkpoint, that's not quite right. When starting from a shutdown checkpoint, REDO points to the beginning of the shutdown record, but we've already effectively replayed it. The next record we replay is the one after the checkpoint. To see that, I added some elog(LOG) calls: ~/pgsql.93stable$ bin/postmaster -D data LOG: database system was shut down at 2014-02-28 14:06:18 EET LOG: ReadCheckpointRecord: 0/16479C98 LOG: database system is ready to accept connections LOG: autovacuum launcher started ^CLOG: received fast shutdown request LOG: aborting any active transactions LOG: autovacuum launcher shutting down LOG: shutting down LOG: INSERT @ 0/16479D00: prev 0/16479C98; xid 0; len 72: XLOG - checkpoint: redo 0/16479D00; tli 1; prev tli 1; fpw true; xid 0/793393; oid 24988; multi 655288; offset 1356722; oldest xid 687 in DB 1; oldest multi 1 in DB 1; oldest running xid 0; shutdown LOG: xlog flush request 0/16479D68; write 0/0; flush 0/0 LOG: database system is shut down ~/pgsql.93stable$ bin/postmaster -D data LOG: database system was shut down at 2014-02-28 14:06:23 EET LOG: ReadCheckpointRecord: 0/16479D00 LOG: database system is ready to accept connections LOG: autovacuum launcher started Killed At this point, the last record is the shutdown checkpoint, beginning at 16479D00, and the server has been killed (immediate shutdown). ~/pgsql.93stable$ cp recovery.conf data/recovery.conf ~/pgsql.93stable$ bin/postmaster -D data LOG: database system was interrupted; last known up at 2014-02-28 14:06:29 EET LOG: entering standby mode LOG: ReadCheckpointRecord: 0/16479D00 LOG: database system was not properly shut down; automatic recovery in progress LOG: record with zero length at 0/16479D68 LOG: reached end of WAL in pg_xlog, entering archive recovery LOG: EndRecPtr: 0/16479D68 lastReplayedEndRecPtr: 0/16479D00 FATAL: could not connect to the primary server: could not connect to server: Connection refused ... Recovery starts from the checkpoint record, but lastReplayedEndRecPtr is set to the *beginning* of the checkpoint record, even though the checkpoint record has already been effectively replayed, by the feat of starting recovery from it. EndRecPtr correctly points to the end of the checkpoint record. Because of the incorrect lastReplayedEndRecPtr value, the CheckRecoveryConsistency() call concludes that it's not consistent. I believe the attached fix is the right way to fix this. - Heikki diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 7399fd4..a36354f 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -5457,13 +5457,17 @@ StartupXLOG(void) /* * Initialize shared variables for tracking progress of WAL replay, - * as if we had just replayed the record before the REDO location. + * as if we had just replayed the record before the REDO location + * (or the checkpoint record itself, if it's a shutdown checkpoint). */ SpinLockAcquire(xlogctl-info_lck); - xlogctl-replayEndRecPtr = checkPoint.redo; + if (checkPoint.redo RecPtr) + xlogctl-replayEndRecPtr = checkPoint.redo; + else + xlogctl-replayEndRecPtr = EndRecPtr; xlogctl-replayEndTLI = ThisTimeLineID; - xlogctl-lastReplayedEndRecPtr = checkPoint.redo; - xlogctl-lastReplayedTLI = ThisTimeLineID; + xlogctl-lastReplayedEndRecPtr = xlogctl-replayEndRecPtr; + xlogctl-lastReplayedTLI = xlogctl-replayEndTLI;