Hi,

On Mon, Mar 23, 2026 at 6:20 AM shawn wang <[email protected]> wrote:
>
> Hi hackers,

Thank you for proposing this new feature.

>  == Motivation ==
>
> We operate a fleet of PostgreSQL instances with logical replication. On 
> several occasions, we have experienced production incidents where logical 
> decoding spill files (pg_replslot/<slot>/xid-*.spill) grew uncontrollably — 
> consuming tens of gigabytes and eventually filling up the data disk. This 
> caused the entire instance to go read-only, impacting not just replication 
> but all write workloads.
>
> The typical scenario is a large transaction (e.g. bulk data load or a 
> long-running DDL) combined with a subscriber that is either slow or 
> temporarily disconnected. The reorder buffer exceeds 
> logical_decoding_work_mem and starts spilling, but there is no upper bound on 
> how much can be spilled. The only backstop today is the OS returning ENOSPC, 
> at which point the damage is already done.

Having a lot of spill files also increases crash/recovery times.
However, files spilling to disk causing no-space-left-on-disk issues
leading to downtime applies to WAL files, historical catalog snapshot
files, subtransaction overflow files, CLOG (and all the subsystems
backed by SLRU data structure), etc. - basically any Postgres
subsystem writing files to disk. I'm a bit worried that we may end up
solving disk space issues, which IMHO are outside of the database
scope, in the database. Others may have different opinions though.

How common is this issue? Could you please add a test case to the
proposed patch that without this feature would otherwise hit the issue
described?

Having said that, were alternatives like disabling subscriptions when
seen occupying the disk space considered?

> We looked for existing protections:
>
> max_slot_wal_keep_size: limits WAL retention, but does not affect spill files 
> at all.
> logical_decoding_work_mem: controls *when* spilling starts, but not *how 
> much* can be spilled.
> There is no existing GUC, patch, or commitfest entry that addresses spill 
> file disk quota.

Interesting!

> The "Report reorder buffer size" patch (CF #6053, by Ashutosh Bapat) improves 
> observability of reorder buffer state, which is complementary — but 
> observability alone cannot prevent disk-full incidents.

With the proposed reorder buffer stats above, would it be possible to
have a monitoring solution (an extension or a tool) to disable
subscriptions and notify the admin? Would something like this work?

> == Proposed solution ==
>
> The attached patch adds a new GUC:
> logical_decoding_spill_limit (integer, unit kB, default 0)
>
> When set to a positive value, it limits the total size of on-disk spill files 
> per replication slot. Key design points:
>
> Tracking: We add two new fields: - ReorderBuffer.spillBytesOnDisk — current 
> total on-disk spill size for this slot (unlike spillBytes which is a 
> cumulative statistic counter, this is a live gauge). - 
> ReorderBufferTXN.serialized_size — per-transaction on-disk size, so we can 
> accurately decrement the global counter during cleanup.
> Increment: In ReorderBufferSerializeChange(), after a successful write(), 
> both counters are incremented by the size written.
> Decrement: In ReorderBufferRestoreCleanup(), when spill files are unlinked, 
> the global counter is decremented by the transaction's serialized_size.
> Enforcement: In ReorderBufferCheckMemoryLimit(), before calling 
> ReorderBufferSerializeTXN(), we check: if (spillBytesOnDisk + txn->size > 
> spill_limit) ereport(ERROR, ...) This is only checked on the spill-to-disk 
> path — not on the streaming path (which involves no disk I/O).
> Behavior on limit exceeded: An ERROR is raised with 
> ERRCODE_CONFIGURATION_LIMIT_EXCEEDED. The walsender exits, but the slot's 
> restart_lsn and confirmed_flush are preserved. The subscriber can reconnect 
> after the DBA:
>
> increases logical_decoding_spill_limit, or
> increases logical_decoding_work_mem (to reduce spilling), or
> switches to a streaming-capable output plugin (which avoids spilling 
> entirely).

When the logical_decoding_spill_limit is exceeded, ERRORing out in the
walsender is even more problematic, right? The replication slot would
be inactive, causing bloat and preventing tuple freezing, WAL files
growth and eventually the system may hit disk-space issues - it is
like "we avoided disk space issues for one subsystem, but introduced
it for another". This looks a bit problematic IMHO. Others may have
different opinions though.

--
Bharath Rupireddy
Amazon Web Services: https://aws.amazon.com


Reply via email to