Re: Synchronizing slots from primary to standby
> I might be missing something but isn’t it okay even if the new primary > server is behind the subscribers? IOW, even if two slot's LSNs (i.e., > restart_lsn and confirm_flush_lsn) are behind the subscriber's remote > LSN (i.e., pg_replication_origin.remote_lsn), the primary sends only > transactions that were committed after the remote_lsn. So the > subscriber can resume logical replication with the new primary without > any data loss. Maybe I'm misreading, but I thought the purpose of this to make sure that the logical subscriber does not have data that has not been replicated to the new primary. The use-case I can think of would be if synchronous_commit were enabled and fail-over occurs. If we didn't have this set, isn't it possible that this logical subscriber has extra commits that aren't present on the newly promoted primary? And sorry I accidentally started a new thread in my last reply. Re-pasting some of my previous questions/comments: wait_for_standby_confirmation does not update standby_slot_names once it's in a loop and can't be fixed with SIGHUP. Similarly, synchronize_slot_names isn't updated once the worker is launched. If a logical slot was dropped on the writer, should the worker drop logical slots that it was previously synchronizing but are no longer present? Or should we leave that to the user to manage? I'm trying to think why users would want to sync logical slots to a reader but not have that be dropped as well if it's no longer present. Is there a reason we're deciding to use one-worker syncing per database instead of one general worker that syncs across all the databases? I imagine I'm missing something obvious here. As for how standby_slot_names should be configured, I'd prefer the flexibility similar to what we have for synchronus_standby_names since that seems the most analogous. It'd provide flexibility for failovers, which I imagine is the most common use-case. On 1/20/22, 9:34 PM, "Masahiko Sawada" wrote: CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. On Wed, Dec 15, 2021 at 7:13 AM Peter Eisentraut wrote: > > On 31.10.21 11:08, Peter Eisentraut wrote: > > I want to reactivate $subject. I took Petr Jelinek's patch from [0], > > rebased it, added a bit of testing. It basically works, but as > > mentioned in [0], there are various issues to work out. > > > > The idea is that the standby runs a background worker to periodically > > fetch replication slot information from the primary. On failover, a > > logical subscriber would then ideally find up-to-date replication slots > > on the new publisher and can just continue normally. > > > So, again, this isn't anywhere near ready, but there is already a lot > > here to gather feedback about how it works, how it should work, how to > > configure it, and how it fits into an overall replication and HA > > architecture. > > The second, > standby_slot_names, is set on the primary. It holds back logical > replication until the listed physical standbys have caught up. That > way, when failover is necessary, the promoted standby is not behind the > logical replication consumers. I might be missing something but isn’t it okay even if the new primary server is behind the subscribers? IOW, even if two slot's LSNs (i.e., restart_lsn and confirm_flush_lsn) are behind the subscriber's remote LSN (i.e., pg_replication_origin.remote_lsn), the primary sends only transactions that were committed after the remote_lsn. So the subscriber can resume logical replication with the new primary without any data loss. The new primary should not be ahead of the subscribers because it forwards the logical replication start LSN to the slot’s confirm_flush_lsn in this case. But it cannot happen since the remote LSN of the subscriber’s origin is always updated first, then the confirm_flush_lsn of the slot on the primary is updated, and then the confirm_flush_lsn of the slot on the standby is synchronized. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Synchronizing slots from primary to standby
Hello, The doc mentions that standby_slot_names is to only be used for waiting on physical slots. However, it seems like we calculate the flush_pos for logical slots as well in wait_for_standby_confirmation? Re-posting some of my previous comment since it seems like it got lost... In wait_for_standby_confirmation, should we move standby_slot_names -> namelist into the while (true) block so if we have wrong values set we fix it with a SIGHUP? Similarly, in slotsync.c, we never update standby_slot_names once the worker is launched. If a logical slot was dropped on the writer, should the worker drop logical slots that it was previously synchronizing but are no longer present? Or should we leave that to the user to manage? I'm trying to think why users would want to sync logical slots to a reader but not have that be dropped if we're able to detect they're no longer present on the writer. Maybe if there was a use-case to set standby_slot_names one-at-a-time, you wouldn't want other logical slots to be dropped, but dropping sounds like reasonable behavior for '*'? Is there a reason we're deciding to use one-worker syncing per database instead of one general worker that syncs across all the databases? I imagine I'm missing something obvious here. As for how standby_slot_names should be configured, I'd prefer the flexibility similar to what we have for synchronus_standby_names since that seems the most analogous. It'd provide flexibility for failovers, which I imagine is the most common use-case. John H On 1/3/22, 5:49 AM, "Peter Eisentraut" wrote: CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. Here is an updated patch to fix some build failures. No feature changes. On 14.12.21 23:12, Peter Eisentraut wrote: > On 31.10.21 11:08, Peter Eisentraut wrote: >> I want to reactivate $subject. I took Petr Jelinek's patch from [0], >> rebased it, added a bit of testing. It basically works, but as >> mentioned in [0], there are various issues to work out. >> >> The idea is that the standby runs a background worker to periodically >> fetch replication slot information from the primary. On failover, a >> logical subscriber would then ideally find up-to-date replication >> slots on the new publisher and can just continue normally. > >> So, again, this isn't anywhere near ready, but there is already a lot >> here to gather feedback about how it works, how it should work, how to >> configure it, and how it fits into an overall replication and HA >> architecture. > > Here is an updated patch. The main changes are that I added two > configuration parameters. The first, synchronize_slot_names, is set on > the physical standby to specify which slots to sync from the primary. By > default, it is empty. (This also fixes the recovery test failures that > I had to disable in the previous patch version.) The second, > standby_slot_names, is set on the primary. It holds back logical > replication until the listed physical standbys have caught up. That > way, when failover is necessary, the promoted standby is not behind the > logical replication consumers. > > In principle, this works now, I think. I haven't made much progress in > creating more test cases for this; that's something that needs more > attention. > > It's worth pondering what the configuration language for > standby_slot_names should be. Right now, it's just a list of slots that > all need to be caught up. More complicated setups are conceivable. > Maybe you have standbys S1 and S2 that are potential failover targets > for logical replication consumers L1 and L2, and also standbys S3 and S4 > that are potential failover targets for logical replication consumers L3 > and L4. Viewed like that, this setting could be a replication slot > setting. The setting might also have some relationship with > synchronous_standby_names. Like, if you have synchronous_standby_names > set, then that's a pretty good indication that you also want some or all > of those standbys in standby_slot_names. (But note that one is slots > and one is application names.) So there are a variety of possibilities.
Re: Synchronizing slots from primary to standby
Hello, I started taking a brief look at the v2 patch, and it does appear to work for the basic case. Logical slot is synchronized across and I can connect to the promoted standby and stream changes afterwards. It's not clear to me what the correct behavior is when a logical slot that has been synced to the replica and then it gets deleted on the writer. Would we expect this to be propagated or leave it up to the end-user to manage? > + rawname = pstrdup(standby_slot_names); > + SplitIdentifierString(rawname, ',', ); > + > + while (true) > + { > + int wait_slots_remaining; > + XLogRecPtr oldest_flush_pos = InvalidXLogRecPtr; > + int rc; > + > + wait_slots_remaining = list_length(namelist); > + > + LWLockAcquire(ReplicationSlotControlLock, LW_SHARED); > + for (int i = 0; i < max_replication_slots; i++) > + { Even though standby_slot_names is PGC_SIGHUP, we never reload/re-process the value. If we have a wrong entry in there, the backend becomes stuck until we re-establish the logical connection. Adding "postmaster/interrupt.h" with ConfigReloadPending / ProcessConfigFile does seem to work. Another thing I noticed is that once it starts waiting in this block, Ctrl+C doesn't seem to terminate the backend? pg_recvlogical -d postgres -p 5432 --slot regression_slot --start -f - .. ^Cpg_recvlogical: error: unexpected termination of replication stream: The logical backend connection is still present: ps aux | grep 51263 hsuchen 51263 80.7 0.0 320180 14304 ?Rs 01:11 3:04 postgres: walsender hsuchen [local] START_REPLICATION pstack 51263 #0 0x7ffee99e79a5 in clock_gettime () #1 0x7f8705e88246 in clock_gettime () from /lib64/libc.so.6 #2 0x0075f141 in WaitEventSetWait () #3 0x0075f565 in WaitLatch () #4 0x00720aea in ReorderBufferProcessTXN () #5 0x007142a6 in DecodeXactOp () #6 0x0071460f in LogicalDecodingProcessRecord () It can be terminated with a pg_terminate_backend though. If we have a physical slot with name foo on the standby, and then a logical slot is created on the writer with the same slot_name it does error out on the replica although it prevents other slots from being synchronized which is probably fine. 2021-12-16 02:10:29.709 UTC [73788] LOG: replication slot synchronization worker for database "postgres" has started 2021-12-16 02:10:29.713 UTC [73788] ERROR: cannot use physical replication slot for logical decoding 2021-12-16 02:10:29.714 UTC [73037] DEBUG: unregistering background worker "replication slot synchronization worker" On 12/14/21, 2:26 PM, "Peter Eisentraut" wrote: CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. On 28.11.21 07:52, Bharath Rupireddy wrote: > 1) Instead of a new LIST_SLOT command, can't we use > READ_REPLICATION_SLOT (slight modifications needs to be done to make > it support logical replication slots and to get more information from > the subscriber). I looked at that but didn't see an obvious way to consolidate them. This is something we could look at again later. > 2) How frequently the new bg worker is going to sync the slot info? > How can it ensure that the latest information exists say when the > subscriber is down/crashed before it picks up the latest slot > information? The interval is currently hardcoded, but could be a configuration setting. In the v2 patch, there is a new setting that orders physical replication before logical so that the logical subscribers cannot get ahead of the physical standby. > 3) Instead of the subscriber pulling the slot info, why can't the > publisher (via the walsender or a new bg worker maybe?) push the > latest slot info? I'm not sure we want to add more functionality to > the walsender, if yes, isn't it going to be much simpler? This sounds like the failover slot feature, which was rejected.
Re: Physical slot restart_lsn advances incorrectly requiring restore from archive
Hi Horiguchi-san, I'll take a look at that thread and see if I can reproduce with the attached patch. It seems like it would directly address this issue. Thanks for taking a look. Cheers, John H On Thu, Jul 16, 2020 at 11:00 AM Kyotaro Horiguchi wrote: Hello, John. At Fri, 10 Jul 2020 20:44:30 +, "Hsu, John" wrote in > Hi hackers, > > We believe we’re seeing a problem with how physical slot’s restart_lsn is > advanced leading to the replicas needing to restore from archive in order for > replication to resume. > The logs below are from reproductions against 10.13. I’m still working on > reproducing it for 12.3. > > WAL write spans two WAL segments . > Write to first WAL segment is complete but not to the second segment. > Write to first WAL segment is acknowledged as flushed from the Postgres > replica. > Primary restarts before the write to second segment completes. It also means > the complete WAL was never written. > Crash recovery finishes at a record before the incomplete WAL write. > Though now replica start the slot at the next WAL segment, since the previous > WAL was already acknowledged as flushed. ... > Redo finishes at 0/2BB0 even though the flush we received from > the replica is already at 0/2C00. > This is problematic because the replica reconnects to the slot > telling it to start past the new redo point. Yeah, that is a problem not only related to restart_lsn. The same cause leads to aother issue of inconsistent archive as discussed in [1]. 1: https://www.postgresql.org/message-id/CBDDFA01-6E40-46BB-9F98-9340F4379505%40amazon.com > The attached patch (against 10) attempts to address this by keeping > track of the first flushLsn in the current segNo, and wait until we > receive one after that before updating. This prevents the WAL from > rotating out of the primary and a reboot from the replica will fix > it instead of needing to restore from archive. On the other hand we can and should advance restart_lsn when we know that the last record is complete. I think a patch in the thread [2] would fix your issue. With the patch primary doesn't send a continuation record at the end of a segment until the whole record is flushed into WAL file. 2: https://www.postgresql.org/message-id/20200625.153532.379700510444980240.horikyota.ntt%40gmail.com regards. -- Kyotaro Horiguchi NTT Open Source Software Center