Here are some review comments for v750002. (this is a WIP but this is what I found so far...)
====== doc/src/sgml/protocol.sgml 1. > > 2. ALTER_REPLICATION_SLOT slot_name ( option [, ...] ) # > > > > <para> > > - If true, the slot is enabled to be synced to the standbys. > > + If true, the slot is enabled to be synced to the standbys > > + so that logical replication can be resumed after failover. > > </para> > > > > This also should have the sentence "The default is false.", e.g. the > > same as the same option in CREATE_REPLICATION_SLOT says. > > I have not added this. I feel the default value related details should > be present in the 'CREATE' part, it is not meaningful for the "ALTER" > part. ALTER does not have any defaults, it just modifies the options > given by the user. You are correct. My mistake. ====== src/backend/postmaster/bgworker.c 2. #include "replication/logicalworker.h" +#include "replication/worker_internal.h" #include "storage/dsm.h" Is this change needed when the rest of the code is removed? ====== src/backend/replication/logical/slotsync.c 3. synchronize_one_slot > > 3. > > + /* > > + * Make sure that concerned WAL is received and flushed before syncing > > + * slot to target lsn received from the primary server. > > + * > > + * This check will never pass if on the primary server, user has > > + * configured standby_slot_names GUC correctly, otherwise this can hit > > + * frequently. > > + */ > > + latestFlushPtr = GetStandbyFlushRecPtr(NULL); > > + if (remote_slot->confirmed_lsn > latestFlushPtr) > > > > BEFORE > > This check will never pass if on the primary server, user has > > configured standby_slot_names GUC correctly, otherwise this can hit > > frequently. > > > > SUGGESTION (simpler way to say the same thing?) > > This will always be the case unless the standby_slot_names GUC is not > > correctly configured on the primary server. > > It is not true. It will not hit this condition "always" but has higher > chances to hit it when standby_slot_names is not configured. I think > you meant 'unless the standby_slot_names GUC is correctly configured'. > I feel the current comment gives clear info (less confusing) and thus > I have not changed it for the time being. I can consider if I get more > comments there. Hmm. I meant what I wrote. The "This" of my suggested text refers to the previous sentence in the comment (not about "hitting" ?? your condition). TBH, regardless of the wording you choose, I think it will be much clearer to move the comment to be inside the if. SUGGESTION /* * Make sure that concerned WAL is received and flushed before syncing * slot to target lsn received from the primary server. */ latestFlushPtr = GetStandbyFlushRecPtr(NULL); if (remote_slot->confirmed_lsn > latestFlushPtr) { /* * Can get here only when if GUC 'standby_slot_names' on the primary * server was not configured correctly. */ ... } ~~~ 4. +static bool +validate_parameters_and_get_dbname(char **dbname) +{ + /* + * A physical replication slot(primary_slot_name) is required on the + * primary to ensure that the rows needed by the standby are not removed + * after restarting, so that the synchronized slot on the standby will not + * be invalidated. + */ + if (PrimarySlotName == NULL || *PrimarySlotName == '\0') + { + ereport(LOG, + /* translator: %s is a GUC variable name */ + errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("bad configuration for slot synchronization"), + errhint("\"%s\" must be defined.", "primary_slot_name")); + return false; + } + + /* + * hot_standby_feedback must be enabled to cooperate with the physical + * replication slot, which allows informing the primary about the xmin and + * catalog_xmin values on the standby. + */ + if (!hot_standby_feedback) + { + ereport(LOG, + /* translator: %s is a GUC variable name */ + errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("bad configuration for slot synchronization"), + errhint("\"%s\" must be enabled.", "hot_standby_feedback")); + return false; + } + + /* + * Logical decoding requires wal_level >= logical and we currently only + * synchronize logical slots. + */ + if (wal_level < WAL_LEVEL_LOGICAL) + { + ereport(LOG, + errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("bad configuration for slot synchronization"), + errhint("\"wal_level\" must be >= logical.")); + return false; + } + + /* + * The primary_conninfo is required to make connection to primary for + * getting slots information. + */ + if (PrimaryConnInfo == NULL || *PrimaryConnInfo == '\0') + { + ereport(LOG, + /* translator: %s is a GUC variable name */ + errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("bad configuration for slot synchronization"), + errhint("\"%s\" must be defined.", "primary_conninfo")); + return false; + } + + /* + * The slot sync worker needs a database connection for walrcv_exec to + * work. + */ + *dbname = walrcv_get_dbname_from_conninfo(PrimaryConnInfo); + if (*dbname == NULL) + { + ereport(LOG, + + /* + * translator: 'dbname' is a specific option; %s is a GUC variable + * name + */ + errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("bad configuration for slot synchronization"), + errhint("'dbname' must be specified in \"%s\".", "primary_conninfo")); + return false; + } + + return true; +} I wonder if it is better to log all the problems in one go instead of making users stumble onto them one at a time after fixing one and then hitting the next problem. e.g. just set some variable "all_ok = false;" each time instead of all the "return false;" Then at the end of the function just "return all_ok;" ====== Kind Regards, Peter Smith. Fujitsu Australia