On Thu, Mar 28, 2024 at 10:08 AM Zhijie Hou (Fujitsu) <houzj.f...@fujitsu.com> wrote: > > When analyzing one BF error[1], we find an issue of slotsync: Since we don't > perform logical decoding for the synced slots when syncing the lsn/xmin of > slot, no logical snapshots will be serialized to disk. So, when user starts to > use these synced slots after promotion, it needs to re-build the consistent > snapshot from the restart_lsn if the WAL(xl_running_xacts) at restart_lsn > position indicates that there are running transactions. This however could > cause the data that before the consistent point to be missed[2]. > > This issue doesn't exist on the primary because the snapshot at restart_lsn > should have been serialized to disk (SnapBuildProcessRunningXacts -> > SnapBuildSerialize), so even if the logical decoding restarts, it can find > consistent snapshot immediately at restart_lsn. > > To fix this, we could use the fast forward logical decoding to advance the > synced > slot's lsn/xmin when syncing these values instead of directly updating the > slot's info. This way, the snapshot will be serialized to disk when decoding. > If we could not reach to the consistent point at the remote restart_lsn, the > slot is marked as temp and will be persisted once it reaches the consistent > point. I am still analyzing the fix and will share once ready. >
Yes, we can use this but one thing to note is that CreateDecodingContext() will expect that the slot's and current database are the same. I think the reason for that is we need to check system tables of the current database while decoding and sending data to the output_plugin which won't be a requirement for the fast_forward case. So, we need to skip that check in fast_forward mode. Next, I was thinking about the case of the first time updating the restart and confirmed_flush LSN while syncing the slots. I think we can keep the current logic as it is based on the following analysis. For each logical slot, cases possible on the primary: 1. The restart_lsn doesn't have a serialized snapshot and hasn't yet reached the consistent point. 2. The restart_lsn doesn't have a serialized snapshot but has reached a consistent point. 3. The restart_lsn has a serialized snapshot which means it has reached a consistent point as well. Considering we keep the logic to reserve initial WAL positions the same as the current (Reserve WAL for the currently active local slot using the specified WAL location (restart_lsn). If the given WAL location has been removed, reserve WAL using the oldest existing WAL segment.), I could think of the below scenarios: A. For 1, we shouldn't sync the slot as it still wouldn't have been marked persistent on the primary. B. For 2, we would sync the slot B1. If remote_restart_lsn >= local_resart_lsn, then advance the slot by calling pg_logical_replication_slot_advance(). B11. If we reach consistent point, then it should be okay because after promotion as well we should reach consistent point. B111. But again is it possible that there is some xact that comes before consistent_point on primary and the same is after consistent_point on standby? This shouldn't matter as we will start decoding transactions after confirmed_flush_lsn which would be the same on primary and standby. B22. If we haven't reached consistent_point, then we won't mark the slot as persistent, and at the next sync we will do the same till it reaches consistent_point. At that time, the situation will be similar to B11. B2. If remote_restart_lsn < local_restart_lsn, then we will wait for the next sync cycle and keep the slot as temporary. Once in the next or some consecutive sync cycle, we reach the condition remote_restart_lsn >= local_restart_lsn, we will proceed to advance the slot and we should have the same behavior as B1. C. For 3, we would sync the slot, but the behavior should be the same as B. Thoughts? -- With Regards, Amit Kapila.