On 02/13/2014 06:47 PM, Heikki Linnakangas wrote:
On 02/13/2014 02:42 PM, Heikki Linnakangas wrote:
The behavior where we prefer a segment from archive with lower TLI over
a file with higher TLI in pg_xlog actually changed in commit
a068c391ab0. Arguably changing it wasn't a good idea, but the problem
your test script demonstrates can be fixed by not archiving the partial
segment, with no change to the preference of archive/pg_xlog. As
discussed, archiving a partial segment seems like a bad idea anyway, so
let's just stop doing that.

After some further thought, while not archiving the partial segment
fixes your test script, it's not enough to fix all variants of the
problem. Even if archive recovery doesn't archive the last, partial,
segment, if the original master server is still running, it's entirely
possible that it fills the segment and archives it. In that case,
archive recovery will again prefer the archived segment with lower TLI
over the segment with newer TLI in pg_xlog.

So I agree we should commit the patch you posted (or something to that
effect). The change to not archive the last segment still seems like a
good idea, but perhaps we should only do that in master.

To draw this to conclusion, barring any further insights to this, I'm going to commit the attached patch to master and REL9_3_STABLE. Please have a look at the patch, to see if I'm missing something. I modified the state machine to skip over XLOG_FROM_XLOG state, if reading in XLOG_FROM_ARCHIVE failed; otherwise you first scan the archive and pg_xlog together, and then pg_xlog alone, which is pointless.

In master, I'm also going to remove the "archive last segment on old timeline" code.

- Heikki
>From 15b0cadb0006b97f87257f4f903db8d33411ada9 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakan...@iki.fi>
Date: Thu, 13 Feb 2014 19:16:00 +0200
Subject: [PATCH 1/1] Change the order that pg_xlog and WAL archive are polled
 for WAL segments.

If there is a WAL segment with same ID but different TLI present in both
the WAL archive and pg_xlog, prefer the one with higher TLI. Before this
patch, the archive was polled first, for all expected TLIs, and only if no
file was found was pg_xlog scanned. This was a change in behavior from 9.3,
which first scanned archive and pg_xlog for the highest TLI, then archive
and pg_xlog for the next highest TLI and so forth. This patch reverts the
behavior back to what it was in 9.2.

The reason for this is that if for example you try to do archive recovery
to timeline 2, which branched off timeline 1, but the WAL for timeline 2 is
not archived yet, we would replay past the timeline switch point on
timeline 1 using the archived files, before even looking timeline 2's files
in pg_xlog

Report and patch by Kyotaro Horiguchi. Backpatch to 9.3 where the behavior
was changed.
---
 src/backend/access/transam/xlog.c | 22 +++++++++-------------
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 6a4a504..b2d8e1a 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -9576,17 +9576,15 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 	/*-------
 	 * Standby mode is implemented by a state machine:
 	 *
-	 * 1. Read from archive (XLOG_FROM_ARCHIVE)
-	 * 2. Read from pg_xlog (XLOG_FROM_PG_XLOG)
-	 * 3. Check trigger file
-	 * 4. Read from primary server via walreceiver (XLOG_FROM_STREAM)
-	 * 5. Rescan timelines
-	 * 6. Sleep 5 seconds, and loop back to 1.
+	 * 1. Read from either archive or pg_xlog (XLOG_FROM_ARCHIVE), or just
+	 *    pg_xlog (XLOG_FROM_XLOG)
+	 * 2. Check trigger file
+	 * 3. Read from primary server via walreceiver (XLOG_FROM_STREAM)
+	 * 4. Rescan timelines
+	 * 5. Sleep 5 seconds, and loop back to 1.
 	 *
 	 * Failure to read from the current source advances the state machine to
-	 * the next state. In addition, successfully reading a file from pg_xlog
-	 * moves the state machine from state 2 back to state 1 (we always prefer
-	 * files in the archive over files in pg_xlog).
+	 * the next state.
 	 *
 	 * 'currentSource' indicates the current state. There are no currentSource
 	 * values for "check trigger", "rescan timelines", and "sleep" states,
@@ -9614,9 +9612,6 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 			switch (currentSource)
 			{
 				case XLOG_FROM_ARCHIVE:
-					currentSource = XLOG_FROM_PG_XLOG;
-					break;
-
 				case XLOG_FROM_PG_XLOG:
 
 					/*
@@ -9781,7 +9776,8 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 				 * Try to restore the file from archive, or read an existing
 				 * file from pg_xlog.
 				 */
-				readFile = XLogFileReadAnyTLI(readSegNo, DEBUG2, currentSource);
+				readFile = XLogFileReadAnyTLI(readSegNo, DEBUG2,
+											  currentSource == XLOG_FROM_ARCHIVE ? XLOG_FROM_ANY : currentSource);
 				if (readFile >= 0)
 					return true;	/* success! */
 
-- 
1.8.5.3

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to