On Mon, Apr 3, 2023 at 4:39 AM Alvaro Herrera <alvhe...@alvh.no-ip.org> wrote: > > > From 56a9559555918a99c202a0924f7b2ede9de4e75d Mon Sep 17 00:00:00 2001 > > From: bdrouvotAWS <bdrou...@amazon.com> > > Date: Tue, 7 Feb 2023 08:59:47 +0000 > > Subject: [PATCH v52 3/6] Allow logical decoding on standby. > > > > Allow a logical slot to be created on standby. Restrict its usage > > or its creation if wal_level on primary is less than logical. > > During slot creation, it's restart_lsn is set to the last replayed > > LSN. Effectively, a logical slot creation on standby waits for an > > xl_running_xact record to arrive from primary. > > Hmm, not sure if it really applies here, but this sounds similar to > issues with track_commit_timestamps: namely, if the primary has it > enabled and you start a standby with it enabled, that's fine; but if the > primary is later shut down (but the standby isn't) and then the primary > restarted with a lesser value, then the standby would misbehave without > any obvious errors. >
IIUC, the patch deals it by invalidating logical slots while replaying the XLOG_PARAMETER_CHANGE record on standby. Then later during decoding, if it encounters XLOG_PARAMETER_CHANGE, and wal_level from primary has been reduced, it will return an error. There is a race condition here as explained in the patch as follows: + /* + * If wal_level on primary is reduced to less than logical, then we + * want to prevent existing logical slots from being used. + * Existing logical slots on standby get invalidated when this WAL + * record is replayed; and further, slot creation fails when the + * wal level is not sufficient; but all these operations are not + * synchronized, so a logical slot may creep in while the wal_level + * is being reduced. Hence this extra check. + */ + if (xlrec->wal_level < WAL_LEVEL_LOGICAL) + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("logical decoding on standby requires " + "wal_level >= logical on master"))); Now, during this race condition, say not only does a logical slot creep in but also one tries to decode WAL using the same then some misbehavior is expected. I have not tried this so not sure if this is really a problem but are you worried about something along those lines? > If that is a real problem, then perhaps you can > solve it by copying some of the logic from track_commit_timestamps, > which took a large number of iterations to get right. > IIUC, track_commit_timestamps deactivates the CommitTs module (by using state in the shared memory) when replaying the XLOG_PARAMETER_CHANGE record. Then later using that state it gives an error from the appropriate place in the CommitTs module. If my understanding is correct then that appears to be a better design than what the patch is currently doing. Also, the error message used in error_commit_ts_disabled() seems to be better than the current one. -- With Regards, Amit Kapila.