On Wed, Jun 11, 2025 at 2:31 AM Masahiko Sawada <[email protected]> wrote:
>
> I think it's the user's responsibility to keep at least one logical
> slot. It seems that setting wal_level to 'logical' would be the most
> reliable solution for this case. We might want to provide a way to
> keep 'logical' WAL level somehow but I don't have a good idea for now.
>
Okay, Let me think more on this.
>
> Considering cascading replication cases too, 2) could be tricky as
> cascaded standbys need to propagate the information of logical slot
> creation up to the most upstream server.
>
Yes, I understand the challenges here.
Thanks for the v2 patches, few concerns:
1)
Now when the slot on standby is invalidated due to effective_wal_level
switched back to replica and if we restart standby, it fails to
restart even if wal_level is explicitly changed to logical in conf
file.
FATAL: logical replication slot "slot_st" exists, but logical
decoding is not enabled
HINT: Change "wal_level" to be "replica" or higher.
2)
I see that when primary switches back its effective wal_level to
replica while standby has wal_level=logical in conf file, then standby
has this status:
postgres=# show wal_level;
wal_level
-----------
logical
postgres=# show effective_wal_level;
effective_wal_level
---------------------
replica
Is this correct? Can effective_wal_level be < wal_level anytime? I
feel it can be greater but never lesser.
3)
When standby invalidate obsolete slots due to effective_wal_level on
primary changed to replica, it dumps below:
LOG: invalidating obsolete replication slot "slot_st2"
DETAIL: Logical decoding on standby requires "wal_level" >= "logical"
on the primary server
Shall we update this message as well to convey about slot-presence on primary.
DETAIL: Logical decoding on standby requires "wal_level" >= "logical"
or presence of logical slot on the primary server.
4)
I see that the slotsync worker is running all the time now as against
the previous behaviour where it will not start if wal_level is less
than logical or switched to '< logical' anytime. Even with wal_level
and effective_wal_level set to replica, slot-sync keeps on attempting
synchronization. This does not look correct. I think we need to find a
way to stop sot-sync worker when effective_wal_level is switched to
replica from logical.
5)
Can you please help me understand the changes at [1].
a) Why is it needed when we have code logic at [2]
b) in [1], why do we check n_inuse_logical_slots on standby and then
make decisions? Why not to disable logical-decoding directly just like
[2]
[1]:
+ if (xlrec.wal_level == WAL_LEVEL_LOGICAL)
+ {
+ /*
+ * If the primary increase WAL level to 'logical', we can
+ * unconditionally enable the logical decoding on the standby.
+ */
+ UpdateLogicalDecodingStatus(true);
+ }
+ else if (xlrec.wal_level == WAL_LEVEL_REPLICA &&
+ pg_atomic_read_u32(&ReplicationSlotCtl->n_inuse_logical_slots) == 0)
+ {
+ /*
+ * Disable the logical decoding if there is no in-use logical slot
+ * on the standby.
+ */
+ UpdateLogicalDecodingStatus(false);
+ }
[2]:
+ else if (info == XLOG_LOGICAL_DECODING_STATUS_CHANGE)
+ {
+ bool logical_decoding;
+
+ memcpy(&logical_decoding, XLogRecGetData(record), sizeof(bool));
+ UpdateLogicalDecodingStatus(logical_decoding);
+
+ /*
+ * Invalidate logical slots if we are in hot standby and the primary
+ * disabled the logical decoding.
+ */
+ if (!logical_decoding && InRecovery && InHotStandby)
+ InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_LEVEL,
+ 0, InvalidOid,
+ InvalidTransactionId);
+
+ LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+ ControlFile->logicalDecodingEnabled = logical_decoding;
+ UpdateControlFile();
+ LWLockRelease(ControlFileLock);
+ }
thanks
Shveta