Hi Masahiko, I agree with your analysis. But why not to take into account the parameter synchronized_standby_slots (garantee the slot on the target has received wal before playing decoded changes). I take your setup.
node1 => node2 => node3 node1 is a primary node2 are standby slot1: is a failover slot created on node2 node2: synchronized_standby_slots=node3,primary_conninfo=node1,slot1: failover=true node3: synchronized_standby_slots=node2,primary_conninfo=node2,slot1: failover=true Case 1) switchover between node2 and node3 node1 => node3 => node2 node2: synchronized_standby_slots=node3,primary_conninfo=node3,slot1:failover=true => the slot could be overwritten, node2 connect now to node 3 and it is the same node which is informed in synchronized_standby_slots Case 2) Restart of node2 node1 => node2 => node3 node2: synchronized_standby_slots=node3,primary_conninfo=node1,slot1:failover=true => slot could not be overwritten because primary_conninfo <> synchronized_standby_slots Regards, Fabrice On Thu, Nov 20, 2025 at 9:35 PM Masahiko Sawada <[email protected]> wrote: > On Thu, Nov 20, 2025 at 6:26 AM Fabrice Chapuis <[email protected]> > wrote: > > > > > I think we need to clarify that suppose the standby has a slot with > > > failover=true and synced=false and the primary has the slot with the > > > same name, failover=true, and synced=true... > > I'm not sure to understand the semantics related to the `synced` flag > but why `synced` flag can be true on a primary instance? AFAICS if > `synced=true` then it means taht the slot is inactive and it is > synchronized with a slot on a remote instance. On a primary, what is the > meaning of having the flag synced set to true? > > I think that the synced can be true on the primary if the slot was > previously synced and the instance is now working as the primary. But > the synced flag being true doesn't mean anything on the primary. It > works only on the standby. > > > There's already an open thread dealing with this issue [1]. > > The problem I see is being able to distinguish between 2 situations: > > 1) A failover slot has been created on a standby (failover=true and > synced=false) in a context of cascading standby. In this case the slot must > not be deleted. > > 2) A former primary has a slot (failover=true and synced=false) that > must be resynchronized and that can be overwritten. > > Right. > > > Why not to use a slot's metadata (allow_overwrite) to treat these two > situations separately. > > I'm not sure that the allow_overwrite idea is the best approach. For > example, suppose that in a cascading replication setup (node-1 -> > node2 -> node3) we create a failover slot on node2 (failover=true, > synced=false, and allow_overwrite=false), the slot is synchronized to > the node3 (failover=true, synced=true, allow_overwrite=false). If we > do a switchover between node2 and node3, node3 joins the primary, > node1, and node2 now joins node3 as a cascaded standby (i.e., > replication setup is now node1 -> node3 -> node2). I guess that in > this case the slot on node2 wants to be overwritten by the one on the > node3, but it's not allowed because the slot on node2 has > allow_overwrite=false. > > Regards, > > -- > Masahiko Sawada > Amazon Web Services: https://aws.amazon.com >
