On Fri, May 15, 2026 at 11:03 AM Xuneng Zhou <[email protected]> wrote: > > Hi Hackers, > > pg_sync_replication_slots() now retries inside a single SQL function > call. Unlike the slotsync worker, it does not get a transaction boundary > between retry cycles, so allocations made while fetching and synchronizing > remote slots can accumulate until the function returns. > > The existing list_free_deep(remote_slots) is not enough to bound this. > It frees the List cells and the RemoteSlot structs stored as list > elements, but it does not recursively free allocations owned by those > structs, such as the copied slot name, plugin name, and database name. > It also does not release unrelated per-cycle allocations made while > fetching and processing the remote slots. > > This is probably modest in normal cases, as the retained memory is roughly > proportional to the number of retries times the number of remote slots. > Still, the function can wait for a long time on a lagging or > misconfigured standby, so the growth is unbounded for that call. > > The attached patch runs each manual retry cycle in a short-lived memory > context and resets it before the next attempt. The slot name list needed > across retries is copied outside that context. > > It also adds two local cleanups. Tuple slots created with > MakeSingleTupleTableSlot() are explicitly released with > ExecDropSingleTupleTableSlot() before clearing the walreceiver result. > The memory context would reclaim the storage eventually, but the slot also > holds a reference to the result tuple descriptor, so releasing it at the > matching ownership boundary seems clearer and follows nearby code. > > drop_local_obsolete_slots() now frees the temporary list container it > builds. The retry-cycle context would reclaim this list too, but freeing > it locally would make the helper self-contained. > > Politely CCed the original authors. > >
I like the core idea of this patch. It makes the flow cleaner and protects the flow from memory-leaks when dealing with a high number of slots. I think during implementation, we considered having a separate memory context, but since we were freeing the remote_slots then, we thought allocating a new memory context might not be required. But on re-thinking and reading the details above, IMO, it is a good improvement. But let's see what others have to say. A couple of minor comments: 1. I think we need to delete the new memory context in slotsync_failure_callback() as well. 2. Also, it will be good to add a comment before switiching to old-context before extract_slot_names(). Or better we can move the swicthing inside extract_slot_names() with a comment. thanks Shveta
