In a blog post (https://about.gitlab.com/blog/2021/09/29/why-we-spent-the-last-month-eliminating-postgresql-subtransactions/), I described how PostgreSQL can enter into a suboverflow condition on the replica under a number of conditions:
1. A long transaction starts. 2. A single SAVEPOINT is issued. 3. Many rows are updated on the primary, and the same rows are read from the replica. This can cause a significant performance degradation with a replica due to SubtransSLRU wait events since the replica needs to perform a parent lookup on an ever-growing range of XIDs. Full details on how to replicate this: https://gitlab.com/-/snippets/2187338. The main two lines of code that cause the replica to enter in the suboverflowed state are here (https://github.com/postgres/postgres/blob/317632f3073fc06047a42075eb5e28a9577a4f96/src/backend/storage/ipc/procarray.c#L2431-L2432): if (TransactionIdPrecedesOrEquals(xmin, procArray->lastOverflowedXid)) suboverflowed = true; I noticed that lastOverflowedXid doesn't get cleared even after all subtransactions have been completed. On a replica, it only seems to be updated via a XLOG_XACT_ASSIGNMENT, but no such message will be sent if subtransactions halt. If the XID wraps around again and a long transaction starts before lastOverflowedXid, the replica might unnecessarily enter in the suboverflow condition again. I've validated this by issuing a SAVEPOINT, running the read/write test, logging lastOverflowedXid to stderr, and then using pg_bench to advance XID with SELECT txid_current(). After many hours, I validated that lastOverflowedXid remained the same, and I could induce a high degree of SubtransSLRU wait events without issuing a new SAVEPOINT. I'm wondering a few things: 1. Should lastOverflowedXid be reset to 0 at some point? I'm not sure if there's a good way at the moment for the replica to know that all subtransactions have completed. 2. Alternatively, should the epoch number be used to compare xmin and lastOverflowedXid? To mitigate this issue, we've considered: 1. Restarting the replicas. This isn't great, and if another SAVEPOINT comes along, we'd have to do this again. It would be nice to be able to monitor the exact value of lastOverflowedXid. 2. Raise the NUM_SUBTRANS_BUFFERS as a workaround until the scalable SLRU patches are available (https://commitfest.postgresql.org/34/2627/). 3. Issue SAVEPOINTs periodically to "run away" from this wraparound issue.