On Tue, Mar 26, 2024 at 2:11 PM Alvaro Herrera <alvhe...@alvh.no-ip.org> wrote: > > On 2024-Mar-26, Amit Kapila wrote: > > > On Tue, Mar 26, 2024 at 1:09 PM Alvaro Herrera <alvhe...@alvh.no-ip.org> > > wrote: > > > On 2024-Mar-26, Amit Kapila wrote: > > > > I would also like to solicit your opinion on the other slot-level > > > > parameter we are planning to introduce. This new slot-level parameter > > > > will be named as inactive_timeout. > > > > > > Maybe inactivity_timeout? > > > > > > > This will indicate that once the slot is inactive for the > > > > inactive_timeout period, we will invalidate the slot. We are also > > > > discussing to have this parameter (inactive_timeout) as GUC [1]. We > > > > can have this new parameter both at the slot level and as well as a > > > > GUC, or just one of those. > > > > > > replication_slot_inactivity_timeout? > > > > So, it seems you are okay to have this parameter both at slot level > > and as a GUC. > > Well, I think a GUC is good to have regardless of the slot parameter, > because the GUC can be used as an instance-wide protection against going > out of disk space because of broken replication. However, now that I > think about it, I'm not really sure about invalidating a slot based on > time rather on disk space, for which we already have a parameter; what's > your rationale for that? The passage of time is not a very good > measure, really, because the amount of WAL being protected has wildly > varying production rate across time. >
The inactive slot not only blocks WAL from being removed but prevents the vacuum from proceeding. Also, there is a risk of transaction Id wraparound. See email [1] for more context. > I can only see a timeout being useful as a parameter if its default > value is not the special disable value; say, the default timeout is 3 > days (to be more precise -- the period from Friday to Monday, that is, > between DBA leaving the office one week until discovering a problem when > he returns early next week). This way we have a built-in mechanism that > invalidates slots regardless of how big the WAL partition is. > We can have a default value for this parameter but it has the potential to break the replication, so not sure what could be a good default value. > > I'm less sure about the slot parameter; in what situation do you need to > extend the life of one individual slot further than the life of all the > other slots? I was thinking of an idle slot scenario where a slot from one particular subscriber (or output plugin) is inactive due to some maintenance activity. But it should be okay to have a GUC for this for now. [1] - https://www.postgresql.org/message-id/20240325195443.GA2923888%40nathanxps13 -- With Regards, Amit Kapila.