On Tue, Jul 25, 2023 at 10:33 PM Andres Freund <and...@anarazel.de> wrote: > > On 2023-07-25 14:31:00 +0530, Amit Kapila wrote: > > To ensure that all the data has been sent during the upgrade, we can > > ensure that each logical slot's confirmed_flush_lsn (position in the > > WAL till which subscriber has confirmed that it has applied the WAL) > > is the same as current_wal_insert_lsn. Now, because we don't send > > XLOG_CHECKPOINT_SHUTDOWN even on clean shutdown, confirmed_flush_lsn > > will never be the same as current_wal_insert_lsn. The one idea being > > discussed in patch [1] (see 0003) is to ensure that each slot's LSN is > > exactly XLOG_CHECKPOINT_SHUTDOWN ago which probably has some drawbacks > > like what if we tomorrow add some other WAL in the shutdown checkpoint > > path or the size of record changes then we would need to modify the > > corresponding code in upgrade. > > Yea, that doesn't seem like a good path. But there is a variant that seems > better: We could just scan the end of the WAL for records that should have > been streamed out? >
This sounds like a better idea. So, one way to realize this is that group slots based on confirmed_flush_lsn and then scan based on that. Once we ensure that the slot group with the highest confirm_flush_location is up-to-date (doesn't have any pending WAL except for shutdown_checkpoint), any slot group having a lesser value of confirm_flush_location would be considered a group with pending data. BTW, I think the main downside for not trying to send XLOG_CHECKPOINT_SHUTDOWN for logical walsenders is that even if today there is no risk of any hint bit updates (or any other possibility of generating WAL) during decoding of XLOG_CHECKPOINT_SHUTDOWN but there is no future guarantee of the same. Is there anything I am missing here? -- With Regards, Amit Kapila.