Re: [HACKERS] [BUG] Archive recovery failure on 9.3+.

Heikki Linnakangas Thu, 13 Feb 2014 04:43:20 -0800

On 02/13/2014 01:37 PM, Kyotaro HORIGUCHI wrote:

At Thu, 13 Feb 2014 10:11:22 +0200, Heikki Linnakangas <[email protected]> 
wrote in <[email protected]>

On 02/13/2014 08:44 AM, Kyotaro HORIGUCHI wrote:

Wouldn't it be better to not archive the old segment, and instead
switch
to a new segment after writing the end-of-recovery checkpoint, so that
the segment on the new timeline is archived sooner?


It would be better to zero-fill and switch segments, yes. We should
NEVER be in a position of archiving two different versions of the same
segment.


Ok, I think we're in agreement that that's the way to go for master.


Does this mean that no mechanical solution for this situation
will be given and an operator should remove the older segment for
the same segment id before starting recovoery process?

I don't understand. With the proposed change to never archive a partialsegment, there is no segment with the same filename in the archive.There is nothing to remove. Or did you mean that you have to do that in9.0-9.2 back-branches, if we only back-patch this change to 9.3?

If it's already in the archive, it's not going to be removed from the
archive.


I have understood the conclusion so far is not archiving the
older segment when promotion but it seems a bit odd story as you
suggested. If the conclusion here is no aid as my new
understanding, would you let me hear the reason why recovery have
changed to prefer archive to pg_xlog?

The commit abf5c5c9a4 seems to change the behavior but I don't
find the reason for the change.

Well, it didn't work too well before abf5c5c9a4 either. With your testscript, and commit abf5c5c9a4 reverted [1], you got this:


### STAGE 3/3: 2/2 START SERVER IN ARCHIVE RECOVERY MODE 2ND RUN ###

waiting for server to start....LOG: database system was interrupted;last known up at 2014-02-13 14:17:53 EET

LOG:  starting archive recovery
LOG:  restored log file "00000002.history" from archive

/bin/cp: cannot stat‘/home/heikki/pgsql.93stable/test/arc/000000020000000000000002’: No suchfile or directory

LOG:  restored log file "000000010000000000000002" from archive
LOG:  record with zero length at 0/2000090
LOG:  invalid primary checkpoint record

/bin/cp: cannot stat‘/home/heikki/pgsql.93stable/test/arc/000000020000000000000002’: No suchfile or directory

LOG:  restored log file "000000010000000000000002" from archive
LOG:  using previous checkpoint record at 0/2000028
LOG:  consistent recovery state reached at 0/2000028
LOG:  record with zero length at 0/2000090
LOG:  redo is not required

/bin/cp: cannot stat‘/home/heikki/pgsql.93stable/test/arc/000000020000000000000002’: No suchfile or directory

LOG:  restored log file "000000010000000000000002" from archive

/bin/cp: cannot stat‘/home/heikki/pgsql.93stable/test/arc/00000003.history’: No such file ordirectory

LOG:  selected new timeline ID: 3
LOG:  restored log file "00000002.history" from archive
LOG:  archive recovery complete
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started

Note the "invalid primary checkpoint record" message. It restored thepartial 000000010000000000000002 file from the archive, and tried torecover using that instead of the 000000020000000000000002 file frompg_xlog. The primary checkpoint record is not present in000000010000000000000002, so it fell back to recover from the previouscheckpoint.

It was the exact same issue, even though it appeared to work if youdon't look carefully. The second recovery never replayed the WALgenerated on timeline 2, it branched off directly from timeline 1, atthe same point where timeline 2 did. You can see that by looking at thehistory files. Also, if you create a table and insert a row to it afterthe first recovery, just before killing the server, you'll see that therow is not there after the second recovery. That's wrong; the data filesmight already contain some changes from timeline 2, and you end up witha corrupt database. Thanks to commit abf5c5c9a4, the situation is atleast detected, and the system refuses to start up.

The behavior where we prefer a segment from archive with lower TLI overa file with higher TLI in pg_xlog actually changed in commita068c391ab0. Arguably changing it wasn't a good idea, but the problemyour test script demonstrates can be fixed by not archiving the partialsegment, with no change to the preference of archive/pg_xlog. Asdiscussed, archiving a partial segment seems like a bad idea anyway, solet's just stop doing that.

[1] just applying commit abf5c5c9a4 in reverse doesn't apply cleanly dueto further changes to the code, but I did this to simulate the old behavior:

diff --git a/src/backend/access/transam/xlog.cb/src/backend/access/transam/xlog.c

index 6a4a504..352d3f7 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -4366,6 +4366,7 @@ readRecoveryCommandFile(void)

        /* Enable fetching from archive recovery area */
        ArchiveRecoveryRequested = true;
+       InArchiveRecovery = true;

        /*
         * If user specified recovery_target_timeline, validate it or compute 
the

- Heikki


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUG] Archive recovery failure on 9.3+.

Reply via email to