On Thu, Jun 18, 2020 at 03:39:09PM +0300, Vyacheslav Makarov wrote: > If the WAL segment for the specified restart_lsn (STOP_LSN of the backup) > exists, then the function will create a physical replication slot and will > keep all the WAL segments required by the replica to catch up with the > primary. Otherwise, it returns error, which means that the required WAL > segments have been already utilised, so we do need to take a new backup. > Without passing this newly added parameter > pg_create_physical_replication_slot() works as before. > > What do you think about this?
I think that this was discussed in the past (perhaps one of the threads related to WAL advancing actually?), and this stuff is full of holes when it comes to think about error handling with checkpoints running in parallel, potentially doing recycling of segments you would expect to be around based on your input value for restart_lsn *while* pg_create_physical_replication_slot() is still running and manipulating the on-disk slot information. I suspect that this also breaks a couple of assumptions behind concurrent calls of the minimum LSN calculated across slots when a caller sees fit to recompute the thresholds (WAL senders mainly here, depending on the replication activity). -- Michael
signature.asc
Description: PGP signature