Re: [BUGS] PITR potentially broken in 9.2

Andres Freund Wed, 05 Dec 2012 13:57:04 -0800

On 2012-12-05 16:15:38 -0500, Tom Lane wrote:
> Simon Riggs <si...@2ndquadrant.com> writes:
> > On 5 December 2012 18:48, Tom Lane <t...@sss.pgh.pa.us> wrote:
> >> On further thought, it seems like recovery_pause_at_target is rather
> >> misdesigned anyway, and taking recovery target parameters from
> >> recovery.conf is an obsolete API that was designed in a world before hot
> >> standby.  What I suggest people really want, if they're trying to
> >> interactively determine how far to roll forward, is this:
> >> ...
>
> > Can't remember why we didn't go for the full API last time. I'll have
> > another go, in HEAD.
>
> That's fine, but the immediate question is what are we doing to fix
> the back branches.  I think everyone is clear that we should be testing
> LocalHotStandbyActive rather than precursor conditions to see if a pause
> is allowed, but are we going to do anything more than that?


I'd like to have inclusive/non-inclusive stops some resemblance of
sanity.

Raw patch including your earlier LocalHotStandbyActive one attached.

> The only other thing I really wanted to do is not have the in-loop pause
> occur after we've taken actions that are effectively part of the WAL
> apply step.  I think ideally it should happen just before or after the
> recoveryStopsHere stanza.  Last night I worried about adding an extra
> spinlock acquire to make that work, but today I wonder if we couldn't
> get away with just a naked
>
>       if (xlogctl->recoveryPause)
>               recoveryPausesHere();

> The argument for this is that although we might fetch a slightly stale
> value of the shared variable, it can't be very stale --- certainly no
> older than the spinlock acquisition near the bottom of the previous
> iteration of the loop.  And this is a highly asynchronous feature
> anyhow, so fuzziness of plus or minus one WAL record in when the pause
> gets honored is not going to be detectable.  Hence an extra spinlock
> acquisition is not worth the cost.

Seems safe enough to me.  But I am not sure its necessary, if we move
the recoveryPausesHere() down after replaying the record, fetching
inside the spinlock for recoveryLastRecPtr seems to be natural enough.

Greetings,

Andres Freund

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 187b609..e962f62 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -5923,6 +5923,10 @@ recoveryStopsHere(XLogRecord *record, bool *includeThis)
 static void
 recoveryPausesHere(void)
 {
+	/* Don't pause unless users can connect! */
+	if (!LocalHotStandbyActive)
+		return;
+
 	ereport(LOG,
 			(errmsg("recovery has paused"),
 			 errhint("Execute pg_xlog_replay_resume() to continue.")));
@@ -6697,19 +6701,18 @@ StartupXLOG(void)
 				 */
 				if (recoveryStopsHere(record, &recoveryApply))
 				{
-					/*
-					 * Pause only if users can connect to send a resume
-					 * message
-					 */
-					if (recoveryPauseAtTarget && standbyState == STANDBY_SNAPSHOT_READY)
-					{
+					if (recoveryPauseAtTarget)
 						SetRecoveryPause(true);
-						recoveryPausesHere();
-					}
+
 					reachedStopPoint = true;	/* see below */
 					recoveryContinue = false;
+
+					/* break out if we have non-inclusive recovery target */
 					if (!recoveryApply)
+					{
+						recoveryPausesHere();
 						break;
+					}
 				}
 
 				/* Setup error traceback support for ereport() */
@@ -6737,19 +6740,14 @@ StartupXLOG(void)
 				/*
 				 * Update shared replayEndRecPtr before replaying this record,
 				 * so that XLogFlush will update minRecoveryPoint correctly.
+				 *
+				 * While we have the lock, also check for a pause request.
 				 */
 				SpinLockAcquire(&xlogctl->info_lck);
 				xlogctl->replayEndRecPtr = EndRecPtr;
-				recoveryPause = xlogctl->recoveryPause;
 				SpinLockRelease(&xlogctl->info_lck);
 
 				/*
-				 * Pause only if users can connect to send a resume message
-				 */
-				if (recoveryPause && standbyState == STANDBY_SNAPSHOT_READY)
-					recoveryPausesHere();
-
-				/*
 				 * If we are attempting to enter Hot Standby mode, process
 				 * XIDs we see
 				 */
@@ -6790,10 +6788,19 @@ StartupXLOG(void)
 				 */
 				SpinLockAcquire(&xlogctl->info_lck);
 				xlogctl->recoveryLastRecPtr = EndRecPtr;
+				recoveryPause = xlogctl->recoveryPause;
 				SpinLockRelease(&xlogctl->info_lck);
 
 				LastRec = ReadRecPtr;
 
+
+				if (recoveryPause)
+					recoveryPausesHere();
+
+				/* if we have inclusive recovery target */
+				if (!recoveryStopsHere)
+					break;
+
 				record = ReadRecord(NULL, LOG, false);
 			} while (record != NULL && recoveryContinue);

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] PITR potentially broken in 9.2

Reply via email to