lastOverflowedXid does not handle transaction ID wraparound

Stan Hu Tue, 12 Oct 2021 21:53:51 -0700

In a blog post 
(https://about.gitlab.com/blog/2021/09/29/why-we-spent-the-last-month-eliminating-postgresql-subtransactions/),
I described how PostgreSQL can enter into a suboverflow condition on
the replica under a number of conditions:

1. A long transaction starts.
2. A single SAVEPOINT is issued.
3. Many rows are updated on the primary, and the same rows are read
from the replica.

This can cause a significant performance degradation with a replica
due to SubtransSLRU wait events since the replica needs to perform a
parent lookup on an ever-growing range of XIDs. Full details on how to
replicate this: https://gitlab.com/-/snippets/2187338.

The main two lines of code that cause the replica to enter in the
suboverflowed state are here
(https://github.com/postgres/postgres/blob/317632f3073fc06047a42075eb5e28a9577a4f96/src/backend/storage/ipc/procarray.c#L2431-L2432):

if (TransactionIdPrecedesOrEquals(xmin, procArray->lastOverflowedXid))
suboverflowed = true;

I noticed that lastOverflowedXid doesn't get cleared even after all
subtransactions have been completed. On a replica, it only seems to be
updated via a XLOG_XACT_ASSIGNMENT, but no such message will be sent
if subtransactions halt. If the XID wraps around again and a long
transaction starts before lastOverflowedXid, the replica might
unnecessarily enter in the suboverflow condition again.

I've validated this by issuing a SAVEPOINT, running the read/write
test, logging lastOverflowedXid to stderr, and then using pg_bench to
advance XID with SELECT txid_current(). After many hours, I validated
that lastOverflowedXid remained the same, and I could induce a high
degree of SubtransSLRU wait events without issuing a new SAVEPOINT.

I'm wondering a few things:

1. Should lastOverflowedXid be reset to 0 at some point? I'm not sure
if there's a good way at the moment for the replica to know that all
subtransactions have completed.
2. Alternatively, should the epoch number be used to compare xmin and
lastOverflowedXid?

To mitigate this issue, we've considered:

1. Restarting the replicas. This isn't great, and if another SAVEPOINT
comes along, we'd have to do this again. It would be nice to be able
to monitor the exact value of lastOverflowedXid.
2. Raise the NUM_SUBTRANS_BUFFERS as a workaround until the scalable
SLRU patches are available
(https://commitfest.postgresql.org/34/2627/).
3. Issue SAVEPOINTs periodically to "run away" from this wraparound issue.

lastOverflowedXid does not handle transaction ID wraparound

Reply via email to