Improve handling of parameter differences in physical replication

Peter Eisentraut Thu, 27 Feb 2020 00:24:28 -0800

When certain parameters are changed on a physical replication primary,this is communicated to standbys using the XLOG_PARAMETER_CHANGE WALrecord. The standby then checks whether its own settings are at leastas big as the ones on the primary. If not, the standby shuts down witha fatal error.

The correspondence of settings between primary and standby is requiredbecause those settings influence certain shared memory sizings that arerequired for processing WAL records that the primary might send. Forexample, if the primary sends a prepared transaction, the standby musthave had max_prepared_transaction set appropriately or it won't be ableto process those WAL records.

However, fatally shutting down the standby immediately upon receipt ofthe parameter change record might be a bit of an overreaction. Theresources related to those settings are not required immediately at thatpoint, and might never be required if the activity on the primary doesnot exhaust all those resources. An extreme example is raisingmax_prepared_transactions on the primary but never actually usingprepared transactions.

Where this becomes a serious problem is if you have many standbys andyou do a failover. If the newly promoted standby happens to have ahigher setting for one of the relevant parameters, all the otherstandbys that have followed it then shut down immediately and won't beable to continue until you change all their settings.

If we didn't do the hard shutdown and we just let the standby roll onwith recovery, nothing bad will happen and it will eventually produce anappropriate error when those resources are required (e.g., "maximumnumber of prepared transactions reached").

So I think there are better ways to handle this. It might be reasonableto provide options. The attached patch doesn't do that but it would bepretty easy. What the attached patch does is:

Upon receipt of XLOG_PARAMETER_CHANGE, we still check the settings butonly issue a warning and set a global flag if there is a problem. Thenwhen we actually hit the resource issue and the flag was set, we issueanother warning message with relevant information. Additionally, atthat point we pause recovery instead of shutting down, so a hot standbyremains usable. (That could certainly be configurable.)

Btw., I think the current setup is slightly buggy. The MaxBackendsvalue that is used to size shared memory is computed as MaxConnections +autovacuum_max_workers + 1 + max_worker_processes + max_wal_senders, butwe don't track autovacuum_max_workers in WAL.


(This patch was developed together with Simon Riggs.)

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From f5b4b7fd853b0dba2deea6b1e8290ae4c6df7081 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <[email protected]>
Date: Thu, 27 Feb 2020 08:50:37 +0100
Subject: [PATCH v1] Improve handling of parameter differences in physical
 replication

When certain parameters are changed on a physical replication primary,
this is communicated to standbys using the XLOG_PARAMETER_CHANGE WAL
record.  The standby then checks whether its own settings are at least
as big as the ones on the primary.  If not, the standby shuts down
with a fatal error.

The correspondence of settings between primary and standby is required
because those settings influence certain shared memory sizings that
are required for processing WAL records that the primary might send.
For example, if the primary sends a prepared transaction, the standby
must have had max_prepared_transaction set appropriately or it won't
be able to process those WAL records.

However, fatally shutting down the standby immediately upon receipt of
the parameter change record might be a bit of an overreaction.  The
resources related to those settings are not required immediately at
that point, and might never be required if the activity on the primary
does not exhaust all those resources.  If we just let the standby roll
on with recovery, it will eventually produce an appropriate error when
those resources are used.

So this patch relaxes this a bit.  Upon receipt of
XLOG_PARAMETER_CHANGE, we still check the settings but only issue a
warning and set a global flag if there is a problem.  Then when we
actually hit the resource issue and the flag was set, we issue another
warning message with relevant information.  Additionally, at that
point we pause recovery, so a hot standby remains usable.
---
 src/backend/access/transam/twophase.c |  3 ++
 src/backend/access/transam/xlog.c     | 56 ++++++++++++++++++++++-----
 src/backend/storage/ipc/procarray.c   | 15 ++++++-
 src/backend/storage/lmgr/lock.c       | 10 +++++
 src/include/access/xlog.h             |  1 +
 5 files changed, 74 insertions(+), 11 deletions(-)

diff --git a/src/backend/access/transam/twophase.c 
b/src/backend/access/transam/twophase.c
index 5adf956f41..fdac2ed69d 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -2360,11 +2360,14 @@ PrepareRedoAdd(char *buf, XLogRecPtr start_lsn,
 
        /* Get a free gxact from the freelist */
        if (TwoPhaseState->freeGXacts == NULL)
+       {
+               StandbyParamErrorPauseRecovery("max_prepared_transaction", 
max_prepared_xacts);
                ereport(ERROR,
                                (errcode(ERRCODE_OUT_OF_MEMORY),
                                 errmsg("maximum number of prepared 
transactions reached"),
                                 errhint("Increase max_prepared_transactions 
(currently %d).",
                                                 max_prepared_xacts)));
+       }
        gxact = TwoPhaseState->freeGXacts;
        TwoPhaseState->freeGXacts = gxact->next;
 
diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index d19408b3be..71c4f87511 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -257,6 +257,8 @@ bool                InArchiveRecovery = false;
 static bool standby_signal_file_found = false;
 static bool recovery_signal_file_found = false;
 
+static bool need_restart_for_parameter_values = false;
+
 /* Was the last xlog file restored from archive, or local? */
 static bool restoredFromArchive = false;
 
@@ -5970,6 +5972,36 @@ SetRecoveryPause(bool recoveryPause)
        SpinLockRelease(&XLogCtl->info_lck);
 }
 
+/*
+ * If in hot standby, pause recovery because of a parameter conflict.
+ *
+ * Similar to recoveryPausesHere() but with a different messaging.  The user
+ * is expected to make the parameter change and restart the server.  If they
+ * just unpause recovery, they will then run into whatever error change is
+ * after this function call for the non-hot-standby case.
+ *
+ * param_name is the parameter at fault and currValue its current value, for
+ * producing a message.
+ */
+void
+StandbyParamErrorPauseRecovery(const char *param_name, int currValue)
+{
+       if (!AmStartupProcess() || !need_restart_for_parameter_values)
+               return;
+
+       ereport(WARNING,
+                       (errmsg("recovery paused because of insufficient 
setting of parameter %s (currently %d)", param_name, currValue),
+                        errdetail("The value must be at least as high as on 
the primary server."),
+                        errhint("Recovery cannot continue unless the parameter 
is changed and the server restarted.")));
+
+       SetRecoveryPause(true);
+       while (RecoveryIsPaused())
+       {
+               pg_usleep(1000000L);    /* 1000 ms */
+               HandleStartupProcInterrupts();
+       }
+}
+
 /*
  * When recovery_min_apply_delay is set, we wait long enough to make sure
  * certain record types are applied at least that interval behind the master.
@@ -6149,16 +6181,20 @@ GetXLogReceiptTime(TimestampTz *rtime, bool *fromStream)
  * Note that text field supplied is a parameter name and does not require
  * translation
  */
-#define RecoveryRequiresIntParameter(param_name, currValue, minValue) \
-do { \
-       if ((currValue) < (minValue)) \
-               ereport(ERROR, \
-                               (errcode(ERRCODE_INVALID_PARAMETER_VALUE), \
-                                errmsg("hot standby is not possible because %s 
= %d is a lower setting than on the master server (its value was %d)", \
-                                               param_name, \
-                                               currValue, \
-                                               minValue))); \
-} while(0)
+static void
+RecoveryRequiresIntParameter(const char *param_name, int currValue, int 
minValue)
+{
+       if (currValue < minValue)
+       {
+               ereport(WARNING,
+                               (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                                errmsg("insufficient setting for parameter 
%s", param_name),
+                                errdetail("%s = %d is a lower setting than on 
the master server (where its value was %d).",
+                                                  param_name, currValue, 
minValue),
+                                errhint("Change parameters and restart the 
server, or there may be resource exhaustion errors sooner or later.")));
+               need_restart_for_parameter_values = true;
+       }
+}
 
 /*
  * Check to see if required parameters are set high enough on this server
diff --git a/src/backend/storage/ipc/procarray.c 
b/src/backend/storage/ipc/procarray.c
index 4a5b26c23d..5b14d4f1a3 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -3653,7 +3653,20 @@ KnownAssignedXidsAdd(TransactionId from_xid, 
TransactionId to_xid,
                 * If it still won't fit then we're out of memory
                 */
                if (head + nxids > pArray->maxKnownAssignedXids)
-                       elog(ERROR, "too many KnownAssignedXids");
+               {
+                       /*
+                        * The error messages here refer to max_connections, 
but any
+                        * setting that contributes to TOTAL_MAX_CACHED_SUBXIDS 
would
+                        * work.  But listing them all without differentiation 
would
+                        * probably be confusing.
+                        */
+                       StandbyParamErrorPauseRecovery("max_connections", 
MaxConnections);
+                       ereport(ERROR,
+                                       (errcode(ERRCODE_OUT_OF_MEMORY),
+                                        errmsg("out of shared memory"),
+                                        errdetail("There are no more 
KnownAssignedXids slots."),
+                                        errhint("You might need to increase 
max_connections.")));
+               }
        }
 
        /* Now we can insert the xids into the space starting at head */
diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 56dba09299..b4658dac25 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -924,10 +924,13 @@ LockAcquireExtended(const LOCKTAG *locktag,
                        if (locallockp)
                                *locallockp = NULL;
                        if (reportMemoryError)
+                       {
+                               
StandbyParamErrorPauseRecovery("max_locks_per_transaction", max_locks_per_xact);
                                ereport(ERROR,
                                                (errcode(ERRCODE_OUT_OF_MEMORY),
                                                 errmsg("out of shared memory"),
                                                 errhint("You might need to 
increase max_locks_per_transaction.")));
+                       }
                        else
                                return LOCKACQUIRE_NOT_AVAIL;
                }
@@ -962,10 +965,13 @@ LockAcquireExtended(const LOCKTAG *locktag,
                if (locallockp)
                        *locallockp = NULL;
                if (reportMemoryError)
+               {
+                       
StandbyParamErrorPauseRecovery("max_locks_per_transaction", max_locks_per_xact);
                        ereport(ERROR,
                                        (errcode(ERRCODE_OUT_OF_MEMORY),
                                         errmsg("out of shared memory"),
                                         errhint("You might need to increase 
max_locks_per_transaction.")));
+               }
                else
                        return LOCKACQUIRE_NOT_AVAIL;
        }
@@ -2747,6 +2753,7 @@ FastPathGetRelationLockEntry(LOCALLOCK *locallock)
                {
                        LWLockRelease(partitionLock);
                        LWLockRelease(&MyProc->backendLock);
+                       
StandbyParamErrorPauseRecovery("max_locks_per_transaction", max_locks_per_xact);
                        ereport(ERROR,
                                        (errcode(ERRCODE_OUT_OF_MEMORY),
                                         errmsg("out of shared memory"),
@@ -4077,6 +4084,7 @@ lock_twophase_recover(TransactionId xid, uint16 info,
        if (!lock)
        {
                LWLockRelease(partitionLock);
+               StandbyParamErrorPauseRecovery("max_locks_per_transaction", 
max_locks_per_xact);
                ereport(ERROR,
                                (errcode(ERRCODE_OUT_OF_MEMORY),
                                 errmsg("out of shared memory"),
@@ -4142,6 +4150,7 @@ lock_twophase_recover(TransactionId xid, uint16 info,
                                elog(PANIC, "lock table corrupted");
                }
                LWLockRelease(partitionLock);
+               StandbyParamErrorPauseRecovery("max_locks_per_transaction", 
max_locks_per_xact);
                ereport(ERROR,
                                (errcode(ERRCODE_OUT_OF_MEMORY),
                                 errmsg("out of shared memory"),
@@ -4434,6 +4443,7 @@ VirtualXactLock(VirtualTransactionId vxid, bool wait)
                {
                        LWLockRelease(partitionLock);
                        LWLockRelease(&proc->backendLock);
+                       
StandbyParamErrorPauseRecovery("max_locks_per_transaction", max_locks_per_xact);
                        ereport(ERROR,
                                        (errcode(ERRCODE_OUT_OF_MEMORY),
                                         errmsg("out of shared memory"),
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 98b033fc20..44465b3829 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -286,6 +286,7 @@ extern XLogRecPtr GetXLogInsertRecPtr(void);
 extern XLogRecPtr GetXLogWriteRecPtr(void);
 extern bool RecoveryIsPaused(void);
 extern void SetRecoveryPause(bool recoveryPause);
+extern void StandbyParamErrorPauseRecovery(const char *param_name, int 
currValue);
 extern TimestampTz GetLatestXTime(void);
 extern TimestampTz GetCurrentChunkReplayStartTime(void);
 
-- 
2.25.0

Improve handling of parameter differences in physical replication

Reply via email to