On Sat, May 16, 2026 at 7:35 PM Xuneng Zhou <[email protected]> wrote: > > On Fri, May 15, 2026 at 9:20 PM Xuneng Zhou <[email protected]> wrote: > > > > Hi Amit, > > > > On Fri, May 15, 2026 at 5:23 PM Amit Kapila <[email protected]> wrote: > > > > > > On Fri, May 15, 2026 at 11:02 AM Xuneng Zhou <[email protected]> wrote: > > > > > > > > pg_sync_replication_slots() now retries inside a single SQL function > > > > call. Unlike the slotsync worker, it does not get a transaction > > > > boundary > > > > between retry cycles, so allocations made while fetching and > > > > synchronizing > > > > remote slots can accumulate until the function returns. > > > > > > > > The existing list_free_deep(remote_slots) is not enough to bound this. > > > > It frees the List cells and the RemoteSlot structs stored as list > > > > elements, but it does not recursively free allocations owned by those > > > > structs, such as the copied slot name, plugin name, and database name. > > > > > > > > > > Right. > > > > > > > It also does not release unrelated per-cycle allocations made while > > > > fetching and processing the remote slots. > > > > > > > > > > BTW, did you notice via test or otherwise, what and how much other > > > unrelated memory is being allocated during each sync cycle if we > > > manually free the allocations you observed? This is mainly to learn > > > the impact of not doing all these allocations in some short-duration > > > memory context. > > > > I noticed this by reading the feature code while walking through the > > PG 19 release notes, not by observing actual memory growth in a > > running system. Besides the RemoteSlot field strings, there seems to > > be a few smaller per-cycle allocations that also accumulate: > > quote_literal_cstr() strings used to build the filtered query, a > > temporary TextDatumGetCString() result for invalidation_reason, the > > standalone TupleTableSlot in fetch_remote_slots(), and the list > > container built by get_local_synced_slots(). I chose a per-cycle > > memory context over manual pfrees to make the retry-cycle lifetime > > explicit and avoid maintaining a destructor for every current and > > future allocation in this path. It may be worth measuring how much > > extra memory actually accumulates during an extended wait to confirm > > the practical impact. > > I did some measurements for the memory growth in the manual > pg_sync_replication_slots() retry path. > > The test used 100 failover logical slots and forced > pg_sync_replication_slots() to keep retrying on the standby. Memory > was sampled with pg_log_backend_memory_contexts() on the backend > running the function. > > On HEAD, the short run showed: > timestamp total_bytes delta > 2026-05-16T10:47:54Z,63422,1339920, > 2026-05-16T10:47:56Z,63422,1405456,65536 > 2026-05-16T10:47:59Z,63422,1405456,0 > 2026-05-16T10:48:01Z,63422,1405456,0 > 2026-05-16T10:48:03Z,63422,1536528,131072 > 2026-05-16T10:48:05Z,63422,1536528,0 > > So the total increase was about 192 KiB. > > After adding a targeted cleanup for the copied RemoteSlot strings, the > same test showed: > > timestamp total_bytes delta > 2026-05-16T11:04:58Z,77000,1339920, > 2026-05-16T11:05:00Z,77000,1339920,0 > 2026-05-16T11:05:02Z,77000,1405456,65536 > 2026-05-16T11:05:04Z,77000,1405456,0 > > So the increase dropped to about 64 KiB. > > With a per-retry memory context around the fetch/synchronize cycle, > the same test stayed flat: > > 2026-05-16T11:09:10Z,84600,1315344, > 2026-05-16T11:09:12Z,84600,1315344,0 > 2026-05-16T11:09:14Z,84600,1315344,0 > 2026-05-16T11:09:16Z,84600,1315344,0 > 2026-05-16T11:09:18Z,84600,1315344,0 > 2026-05-16T11:09:20Z,84600,1315344,0 > > Looking at the memory-context dumps, the growth on HEAD was visible > under ExprContext. The grand-total increase matched the ExprContex > increase, which is consistent with retry-cycle allocations surviving > for the duration of the single SQL function call. > > That said, the practical severity looks small. This is mainly because > the retry loop is not a tight loop. With no slot activity, > wait_for_slot_activity() doubles the sleep time up to 30 seconds. > > So after about 51 seconds it retries only about twice per minute. For > 100 slots and no slot activity, a rough 1-hour test from the short > runs is on the order of a few MB on HEAD, and around 1 MB with the > manual RemoteSlot cleanup. For installations with fewer slots, it > should be smaller. > > So my read is: > the accumulation is real; > it is modest because of the retry backoff; > manually freeing RemoteSlot’s copied strings removes most of the > observed growth; > a per-retry memory context fully bounds the retry-cycle lifetime
Expectedly, the memory accumulation is much more pronounced with a churning slot to disablet the backoff. I'll share the findings later. -- Regards, Xuneng Zhou HighGo Software Co., Ltd.
