On Mon, Feb 1, 2021 at 1:41 PM Dilip Kumar <dilipbal...@gmail.com> wrote:
>
> On Mon, Feb 1, 2021 at 11:59 AM Kyotaro Horiguchi
> <horikyota....@gmail.com> wrote:
> >
> > At Sun, 31 Jan 2021 11:24:30 +0530, Dilip Kumar <dilipbal...@gmail.com> 
> > wrote in
> > > On Fri, Jan 29, 2021 at 4:33 PM Dilip Kumar <dilipbal...@gmail.com> wrote:
> > > >
> > > > On Fri, Jan 29, 2021 at 3:25 PM Yugo NAGATA <nag...@sraoss.co.jp> wrote:
> > > > >
> > > > > On Thu, 28 Jan 2021 09:55:42 +0530
> > > > > Dilip Kumar <dilipbal...@gmail.com> wrote:
> > > > >
> > > > > > On Wed, Jan 27, 2021 at 2:28 PM Dilip Kumar <dilipbal...@gmail.com> 
> > > > > > wrote:
> > > > > > >
> > > > > > > On Wed, Jan 27, 2021 at 2:06 PM Yugo NAGATA <nag...@sraoss.co.jp> 
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > On Wed, 27 Jan 2021 13:29:23 +0530
> > > > > > > > Dilip Kumar <dilipbal...@gmail.com> wrote:
> > > > > > > >
> > > > > > > > > On Wed, Jan 27, 2021 at 12:50 PM Masahiko Sawada 
> > > > > > > > > <sawada.m...@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Tue, Jan 26, 2021 at 2:00 AM Robert Haas 
> > > > > > > > > > <robertmh...@gmail.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Sat, Jan 23, 2021 at 6:10 AM Bharath Rupireddy
> > > > > > > > > > > <bharath.rupireddyforpostg...@gmail.com> wrote:
> > > > > > > > > > > > +1 to just show the recovery pause state in the output 
> > > > > > > > > > > > of
> > > > > > > > > > > > pg_is_wal_replay_paused. But, should the function name
> > > > > > > > > > > > "pg_is_wal_replay_paused" be something like
> > > > > > > > > > > > "pg_get_wal_replay_pause_state" or some other? To me, 
> > > > > > > > > > > > when "is" exists
> > > > > > > > > > > > in a function, I expect a boolean output. Others may 
> > > > > > > > > > > > have better
> > > > > > > > > > > > thoughts.
> > > > > > > > > > >
> > > > > > > > > > > Maybe we should leave the existing function 
> > > > > > > > > > > pg_is_wal_replay_paused()
> > > > > > > > > > > alone and add a new one with the name you suggest that 
> > > > > > > > > > > returns text.
> > > > > > > > > > > That would create less burden for tool authors.
> > > > > > > > > >
> > > > > > > > > > +1
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Yeah, we can do that, I will send an updated patch soon.
> > > > > > > >
> > > > > > > > This means pg_is_wal_replay_paused is left without any change 
> > > > > > > > and this
> > > > > > > > returns whether pause is requested or not? If so, it seems good 
> > > > > > > > to modify
> > > > > > > > the documentation of this function in order to note that this 
> > > > > > > > could not
> > > > > > > > return the actual pause state.
> > > > > > >
> > > > > > > Yes, we can say that it will return true if the replay pause is
> > > > > > > requested.  I am changing that in my new patch.
> > > > > >
> > > > > > I have modified the patch, changes
> > > > > >
> > > > > > - I have added a new interface pg_get_wal_replay_pause_state to get
> > > > > > the pause request state
> > > > > > - Now, we are not waiting for the recovery to actually get paused 
> > > > > > so I
> > > > > > think it doesn't make sense to put a lot of checkpoints to check the
> > > > > > pause requested so I have removed that check from the
> > > > > > recoveryApplyDelay but I think it better we still keep that check in
> > > > > > the WaitForWalToBecomeAvailable because it can wait forever before 
> > > > > > the
> > > > > > next wal get available.
> > > > >
> > > > > I think basically the check in WaitForWalToBecomeAvailable is 
> > > > > independent
> > > > > of the feature of pg_get_wal_replay_pause_state, that is, reporting 
> > > > > the
> > > > > actual pause state.  This function could just return 'pause requested'
> > > > > if a pause is requested during waiting for WAL.
> > > > >
> > > > > However, I agree the change to allow recovery to transit the state to
> > > > > 'paused' during WAL waiting because 'paused' has more useful 
> > > > > information
> > > > > for users than 'pause requested'.  Returning 'paused' lets users know
> > > > > clearly that no more WAL are applied until recovery is resumed.  On 
> > > > > the
> > > > > other hand, when 'pause requested' is returned, user can't say whether
> > > > > the next WAL wiill be applied or not from this information.
> > > > >
> > > > > For the same reason, I think it is also useful to call 
> > > > > recoveryPausesHere
> > > > > in recoveryApplyDelay.
> > > >
> > > > IMHO the WaitForWalToBecomeAvailable can wait until the next wal get
> > > > available so it can not be controlled by user so it is good to put a
> > > > check for the recovery pause,  however recoveryApplyDelay wait for the
> > > > apply delay which is configured by user and it is predictable value by
> > > > the user.  I don't have much objection to putting that check in the
> > > > recoveryApplyDelay as well but I feel it is not necessary.  Any other
> > > > thoughts on this?
> > > >
> > > > > In addition, in RecoveryRequiresIntParameter, recovery should get 
> > > > > paused
> > > > > if a parameter value has a problem.  However, 
> > > > > pg_get_wal_replay_pause_state
> > > > > will return 'pause requested' in this case. So, I think, we should 
> > > > > pass
> > > > > RECOVERY_PAUSED to SetRecoveryPause() instead of 
> > > > > RECOVERY_PAUSE_REQUESTED,
> > > > > or call CheckAndSetRecoveryPause() in the loop like 
> > > > > recoveryPausesHere().
> > > >
> > > > Yeah, absolutely right, it must pass RECOVERY_PAUSED.  I will change
> > > > this, thanks for noticing this.
> > >
> > > I have changed this in the new patch.
> >
> > It seems to work well. The checkpoints seems to be placed properly.
>
> Okay
>
> > +SetRecoveryPause(RecoveryPauseState state)
> >  {
> > +       Assert(state >= RECOVERY_NOT_PAUSED && state <= RECOVERY_PAUSED);
> >
> > I'm not sure that state worth FATAL.  Isn't it enough to just ERROR
> > out like XLogFileRead?
>
> Yeah, that makes sense to me.
>
> > CheckAndSetRecovery() has only one caller.  I think it's better to
> > write the code directly.
>
> Okay, I will change.
>
> > I think the documentation of pg_wal_replay_pause needs to be a bit
> > more detailed about the difference between the two states "pause
> > requested" and "paused". Something like "A request doesn't mean that
> > recovery stops right away. If you want a guarantee that recovery is
> > actually paused, you need to check for the recovery pause state
> > returned by pg_wal_replay_pause_state(). Note that
> > pg_is_wal_repay_paused() returns whether a request is made."
>
> That seems like better idea, I will change.
>

Please find an updated patch which addresses these comments.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
From 5a60cb3e12bbb28fdcb205a0fb1940f136f4e389 Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilipkumar@localhost.localdomain>
Date: Wed, 27 Jan 2021 16:46:04 +0530
Subject: [PATCH v10] Provide a new interface to get the recovery pause status

Currently, pg_is_wal_replay_paused, just check whether the recovery
pause is requested or not but it doesn't actually tell whether the
recovery is actually paused or not.  So basically there is no way for
the user to know the actual status of the pause request.  This patch
provides a new interface pg_get_wal_replay_pause_state that will
return the actual status of the recovery pause i.e.'not paused' if
pause is not requested, 'pause requested' if pause is requested but
recovery is not yet paused and 'paused' if recovery is actually paused.
---
 doc/src/sgml/func.sgml                 | 31 +++++++++++++---
 src/backend/access/transam/xlog.c      | 64 +++++++++++++++++++++++++---------
 src/backend/access/transam/xlogfuncs.c | 50 +++++++++++++++++++++++---
 src/include/access/xlog.h              | 12 +++++--
 src/include/catalog/pg_proc.dat        |  4 +++
 5 files changed, 133 insertions(+), 28 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index f30eaa3..8a05918 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -25251,7 +25251,23 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
         <returnvalue>boolean</returnvalue>
        </para>
        <para>
-        Returns true if recovery is paused.
+        Returns true if recovery pause is requested.
+       </para></entry>
+      </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_get_wal_replay_pause_state</primary>
+        </indexterm>
+        <function>pg_get_wal_replay_pause_state</function> ()
+        <returnvalue>text</returnvalue>
+       </para>
+       <para>
+        Returns recovery pause state, the return values are <literal>not paused
+        </literal> if pause is not requested, <literal>pause requested
+        </literal> if pause is requested but recovery is not yet paused and,
+        <literal>paused</literal> if the recovery is actually paused.
        </para></entry>
       </row>
 
@@ -25290,10 +25306,15 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
         <returnvalue>void</returnvalue>
        </para>
        <para>
-        Pauses recovery.  While recovery is paused, no further database
-        changes are applied.  If hot standby is active, all new queries will
-        see the same consistent snapshot of the database, and no further query
-        conflicts will be generated until recovery is resumed.
+        Request to pause recovery.  A request doesn't mean that recovery stops
+        right away.  If you want a guarantee that recovery is actually paused,
+        you need to check for the recovery pause state returned by
+        <function>pg_get_wal_replay_pause_state()</function>.  Note that
+        <function>pg_is_wal_repay_paused()</function> returns whether a request
+        is made.  While recovery is paused, no further database changes are applied.
+        If hot standby is active, all new queries will see the same consistent
+        snapshot of the database, and no further query conflicts will be generated
+        until recovery is resumed.
        </para>
        <para>
         This function is restricted to superusers by default, but other users
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index f03bd47..fb16293 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -721,8 +721,8 @@ typedef struct XLogCtlData
 	 * only relevant for replication or archive recovery
 	 */
 	TimestampTz currentChunkStartTime;
-	/* Are we requested to pause recovery? */
-	bool		recoveryPause;
+	/* Recovery pause state */
+	RecoveryPauseState	recoveryPauseState;
 
 	/*
 	 * lastFpwDisableRecPtr points to the start of the last replayed
@@ -6019,7 +6019,7 @@ recoveryStopsAfter(XLogReaderState *record)
 }
 
 /*
- * Wait until shared recoveryPause flag is cleared.
+ * Wait until shared recoveryPauseState is set to RECOVERY_NOT_PAUSED.
  *
  * endOfRecovery is true if the recovery target is reached and
  * the paused state starts at the end of recovery because of
@@ -6049,34 +6049,55 @@ recoveryPausesHere(bool endOfRecovery)
 				(errmsg("recovery has paused"),
 				 errhint("Execute pg_wal_replay_resume() to continue.")));
 
-	while (RecoveryIsPaused())
+	/* loop until recoveryPauseState is set to RECOVERY_NOT_PAUSED */
+	while (GetRecoveryPauseState() != RECOVERY_NOT_PAUSED)
 	{
 		HandleStartupProcInterrupts();
+
 		if (CheckForStandbyTrigger())
 			return;
 		pgstat_report_wait_start(WAIT_EVENT_RECOVERY_PAUSE);
+
+		/*
+		 * If recovery pause is requested then set it paused.  While we are in
+		 * the loop, user might resume and pause again so set this every time.
+		 */
+		SpinLockAcquire(&XLogCtl->info_lck);
+		if (XLogCtl->recoveryPauseState == RECOVERY_PAUSE_REQUESTED)
+			XLogCtl->recoveryPauseState = RECOVERY_PAUSED;
+		SpinLockRelease(&XLogCtl->info_lck);
+
 		pg_usleep(1000000L);	/* 1000 ms */
 		pgstat_report_wait_end();
 	}
 }
 
-bool
-RecoveryIsPaused(void)
+/*
+ * Get the current state of the recovery pause request.
+ */
+RecoveryPauseState
+GetRecoveryPauseState(void)
 {
-	bool		recoveryPause;
+	RecoveryPauseState	state;
 
 	SpinLockAcquire(&XLogCtl->info_lck);
-	recoveryPause = XLogCtl->recoveryPause;
+	state = XLogCtl->recoveryPauseState;
 	SpinLockRelease(&XLogCtl->info_lck);
 
-	return recoveryPause;
+	return state;
 }
 
+/*
+ * Set the recovery pause state.
+ */
 void
-SetRecoveryPause(bool recoveryPause)
+SetRecoveryPause(RecoveryPauseState state)
 {
+	if (state < RECOVERY_NOT_PAUSED || state > RECOVERY_PAUSED)
+		elog(ERROR, "invalid recovery pause state %d", state);
+
 	SpinLockAcquire(&XLogCtl->info_lck);
-	XLogCtl->recoveryPause = recoveryPause;
+	XLogCtl->recoveryPauseState = state;
 	SpinLockRelease(&XLogCtl->info_lck);
 }
 
@@ -6270,14 +6291,14 @@ RecoveryRequiresIntParameter(const char *param_name, int currValue, int minValue
 							   currValue,
 							   minValue)));
 
-			SetRecoveryPause(true);
+			SetRecoveryPause(RECOVERY_PAUSED);
 
 			ereport(LOG,
 					(errmsg("recovery has paused"),
 					 errdetail("If recovery is unpaused, the server will shut down."),
 					 errhint("You can then restart the server after making the necessary configuration changes.")));
 
-			while (RecoveryIsPaused())
+			while (GetRecoveryPauseState() != RECOVERY_NOT_PAUSED)
 			{
 				HandleStartupProcInterrupts();
 
@@ -7194,7 +7215,7 @@ StartupXLOG(void)
 		XLogCtl->lastReplayedTLI = XLogCtl->replayEndTLI;
 		XLogCtl->recoveryLastXTime = 0;
 		XLogCtl->currentChunkStartTime = 0;
-		XLogCtl->recoveryPause = false;
+		XLogCtl->recoveryPauseState = RECOVERY_NOT_PAUSED;
 		SpinLockRelease(&XLogCtl->info_lck);
 
 		/* Also ensure XLogReceiptTime has a sane value */
@@ -7298,7 +7319,8 @@ StartupXLOG(void)
 				 * otherwise would is a minor issue, so it doesn't seem worth
 				 * adding another spinlock cycle to prevent that.
 				 */
-				if (((volatile XLogCtlData *) XLogCtl)->recoveryPause)
+				if (((volatile XLogCtlData *) XLogCtl)->recoveryPauseState ==
+					RECOVERY_PAUSE_REQUESTED)
 					recoveryPausesHere(false);
 
 				/*
@@ -7323,7 +7345,8 @@ StartupXLOG(void)
 					 * here otherwise pausing during the delay-wait wouldn't
 					 * work.
 					 */
-					if (((volatile XLogCtlData *) XLogCtl)->recoveryPause)
+					if (((volatile XLogCtlData *) XLogCtl)->recoveryPauseState ==
+						RECOVERY_PAUSE_REQUESTED)
 						recoveryPausesHere(false);
 				}
 
@@ -7497,7 +7520,7 @@ StartupXLOG(void)
 						proc_exit(3);
 
 					case RECOVERY_TARGET_ACTION_PAUSE:
-						SetRecoveryPause(true);
+						SetRecoveryPause(RECOVERY_PAUSE_REQUESTED);
 						recoveryPausesHere(true);
 
 						/* drop into promote */
@@ -12625,6 +12648,13 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 				elog(ERROR, "unexpected WAL source %d", currentSource);
 		}
 
+		/* test for recovery pause, if user has requested the pause */
+		if (((volatile XLogCtlData *) XLogCtl)->recoveryPauseState ==
+			RECOVERY_PAUSE_REQUESTED)
+			recoveryPausesHere(false);
+
+		now = GetCurrentTimestamp();
+
 		/*
 		 * This possibly-long loop needs to handle interrupts of startup
 		 * process.
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 5e1aab3..349ad0c 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -517,7 +517,7 @@ pg_walfile_name(PG_FUNCTION_ARGS)
 }
 
 /*
- * pg_wal_replay_pause - pause recovery now
+ * pg_wal_replay_pause - Request to pause recovery
  *
  * Permission checking for this function is managed through the normal
  * GRANT system.
@@ -538,7 +538,10 @@ pg_wal_replay_pause(PG_FUNCTION_ARGS)
 				 errhint("%s cannot be executed after promotion is triggered.",
 						 "pg_wal_replay_pause()")));
 
-	SetRecoveryPause(true);
+	SetRecoveryPause(RECOVERY_PAUSE_REQUESTED);
+
+	/* wake up the recovery process so that it can process the pause request */
+	WakeupRecovery();
 
 	PG_RETURN_VOID();
 }
@@ -565,7 +568,7 @@ pg_wal_replay_resume(PG_FUNCTION_ARGS)
 				 errhint("%s cannot be executed after promotion is triggered.",
 						 "pg_wal_replay_resume()")));
 
-	SetRecoveryPause(false);
+	SetRecoveryPause(RECOVERY_NOT_PAUSED);
 
 	PG_RETURN_VOID();
 }
@@ -582,7 +585,46 @@ pg_is_wal_replay_paused(PG_FUNCTION_ARGS)
 				 errmsg("recovery is not in progress"),
 				 errhint("Recovery control functions can only be executed during recovery.")));
 
-	PG_RETURN_BOOL(RecoveryIsPaused());
+	PG_RETURN_BOOL(GetRecoveryPauseState() != RECOVERY_NOT_PAUSED);
+}
+
+/*
+ * pg_get_wal_replay_pause_state - Returns the recovery pause state.
+ *
+ * Returned values:
+ *
+ * 'not paused' - if pause is not requested
+ * 'pause requested' - if pause is requested but recovery is not yet paused
+ * 'paused' - if recovery is paused
+ */
+Datum
+pg_get_wal_replay_pause_state(PG_FUNCTION_ARGS)
+{
+	char	*state;
+
+	if (!RecoveryInProgress())
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("recovery is not in progress"),
+				 errhint("Recovery control functions can only be executed during recovery.")));
+
+	/* get the recovery pause state */
+	switch(GetRecoveryPauseState())
+	{
+		case RECOVERY_NOT_PAUSED:
+			state = "not paused";
+			break;
+		case RECOVERY_PAUSE_REQUESTED:
+			state = "pause requested";
+			break;
+		case RECOVERY_PAUSED:
+			state = "paused";
+			break;
+		default:
+			elog(ERROR, "invalid recovery pause state");
+	}
+
+	PG_RETURN_TEXT_P(cstring_to_text(state));
 }
 
 /*
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 75ec107..7533c48 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -174,6 +174,14 @@ typedef enum RecoveryState
 	RECOVERY_STATE_DONE			/* currently in production */
 } RecoveryState;
 
+/* Recovery pause states */
+typedef enum RecoveryPauseState
+{
+	RECOVERY_NOT_PAUSED = 0,		/* pause not requested */
+	RECOVERY_PAUSE_REQUESTED = 1,	/* pause requested, but yet paused */
+	RECOVERY_PAUSED = 2				/* recovery is paused */
+} RecoveryPauseState;
+
 extern PGDLLIMPORT int wal_level;
 
 /* Is WAL archiving enabled (always or only while server is running normally)? */
@@ -310,8 +318,8 @@ extern void GetXLogReceiptTime(TimestampTz *rtime, bool *fromStream);
 extern XLogRecPtr GetXLogReplayRecPtr(TimeLineID *replayTLI);
 extern XLogRecPtr GetXLogInsertRecPtr(void);
 extern XLogRecPtr GetXLogWriteRecPtr(void);
-extern bool RecoveryIsPaused(void);
-extern void SetRecoveryPause(bool recoveryPause);
+extern RecoveryPauseState GetRecoveryPauseState(void);
+extern void SetRecoveryPause(RecoveryPauseState state);
 extern TimestampTz GetLatestXTime(void);
 extern TimestampTz GetCurrentChunkReplayStartTime(void);
 
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index b5f52d4..12ab479 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6222,6 +6222,10 @@
   proname => 'pg_is_wal_replay_paused', provolatile => 'v',
   prorettype => 'bool', proargtypes => '',
   prosrc => 'pg_is_wal_replay_paused' },
+{ oid => '1137', descr => 'get wal replay is pause state',
+  proname => 'pg_get_wal_replay_pause_state', provolatile => 'v',
+  prorettype => 'text', proargtypes => '',
+  prosrc => 'pg_get_wal_replay_pause_state' },
 
 { oid => '2621', descr => 'reload configuration files',
   proname => 'pg_reload_conf', provolatile => 'v', prorettype => 'bool',
-- 
1.8.3.1

Reply via email to