On 2020/02/27 17:23, Peter Eisentraut wrote:
When certain parameters are changed on a physical replication primary,   this 
is communicated to standbys using the XLOG_PARAMETER_CHANGE WAL record.  The 
standby then checks whether its own settings are at least as big as the ones on 
the primary.  If not, the standby shuts down with a fatal error.

The correspondence of settings between primary and standby is required because 
those settings influence certain shared memory sizings that are required for 
processing WAL records that the primary might send.  For example, if the 
primary sends a prepared transaction, the standby must have had 
max_prepared_transaction set appropriately or it won't be able to process those 
WAL records.

However, fatally shutting down the standby immediately upon receipt of the 
parameter change record might be a bit of an overreaction.  The resources 
related to those settings are not required immediately at that point, and might 
never be required if the activity on the primary does not exhaust all those 
resources.  An extreme example is raising max_prepared_transactions on the 
primary but never actually using prepared transactions.

Where this becomes a serious problem is if you have many standbys and you do a 
failover.  If the newly promoted standby happens to have a higher setting for 
one of the relevant parameters, all the other standbys that have followed it 
then shut down immediately and won't be able to continue until you change all 
their settings.

If we didn't do the hard shutdown and we just let the standby roll on with recovery, 
nothing bad will happen and it will eventually produce an appropriate error when those 
resources are required (e.g., "maximum number of prepared transactions 
reached").

So I think there are better ways to handle this.  It might be reasonable to 
provide options.  The attached patch doesn't do that but it would be pretty 
easy.  What the attached patch does is:

Upon receipt of XLOG_PARAMETER_CHANGE, we still check the settings but only 
issue a warning and set a global flag if there is a problem.  Then when we 
actually hit the resource issue and the flag was set, we issue another warning 
message with relevant information.  Additionally, at that point we pause 
recovery instead of shutting down, so a hot standby remains usable.  (That 
could certainly be configurable.)

+1
Btw., I think the current setup is slightly buggy.  The MaxBackends value that 
is used to size shared memory is computed as MaxConnections + 
autovacuum_max_workers + 1 + max_worker_processes + max_wal_senders, but we 
don't track autovacuum_max_workers in WAL.

Maybe this is because autovacuum doesn't work during recovery?

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters


Reply via email to