On Mon, Dec 30, 2024 at 10:08:31AM +0900, Michael Paquier wrote:
> On Fri, Dec 27, 2024 at 06:16:24PM +0300, Vitaly Davydov wrote:
> > As an idea, I would like to propose to store FullTransactionId in
> > global transaction state instead of TransactionId. I'm not sure, it
> > will consume significant additional memory, but it make the code
> > more clear and probably result into less number of locks.
>
> FWIW, I was wondering about doing the same thing. However, I have
> concluded that this some refactoring work out of the scope of fixing
> the primary issue we have here, as we are hit by the way the file
> names are built when we attempt to remove them.
>
> Note that the memory footprint of storing a FullTransactionId in
> twophase.c's GlobalTransactionData does not worry me much. It is more
> important to not increase the size of the two-phase state data,
> impacting files on disk and their WAL records. This size argument
> has been mentioned on the thread that has added epochs to the 2PC file
> names, as far as I recall.
I agree with accepting +4 bytes in GlobalTransactionData.
> At the end, I have applied two patches, the first one down to 13 that
> took care of the "future" issue, with tests added in the v14~v16
> range. v13 was lacking a perl routine, and it's not a big deal to not
> have coverage for this code path anyway. The second patch has been
> applied down to v17, to fix the epoch issue, with the more advanced
> tests.
>
> If you have a suggestion of patch to plug in a FullTransactionId into
> GlobalTransactionData rather than a TransactionId, feel free to
> propose something! Help is always welcome, and this would be
> HEAD-only work, making it less urgent to deal with.
I suspect we should do that and back-patch to v17 before the next releases,
because commit 7e125b2 wrote this function:
/*
* Compute FullTransactionId for the given TransactionId, using the
current
* epoch.
*/
static inline FullTransactionId
FullTransactionIdFromCurrentEpoch(TransactionId xid)
{
FullTransactionId fxid;
FullTransactionId nextFullXid;
uint32 epoch;
nextFullXid = ReadNextFullTransactionId();
epoch = EpochFromFullTransactionId(nextFullXid);
fxid = FullTransactionIdFromEpochAndXid(epoch, xid);
return fxid;
}
I think "using the current epoch" is wrong for half of the nextFullXid values
having epoch > 0. For example, nextFullId==2^32 is in epoch 1, but all the
allowable XIDs are in epoch 0. (I mean "allowable" in the sense of
AssertTransactionIdInAllowableRange().) From then until we assign another
2^31 XIDs, epochs 0 and 1 are both expected in XID values. After 2^31 XID
assignments, every allowable XID be in epoch 1. Hence, twophase.c would need
two-epoch logic like we have in widen_snapshot_xid() and XLogRecGetFullXid().
Is that right? (I wrote this in a hurry, so this email may have more than the
standard level of errors.) Before commit 7e125b2, twophase also had that
logic. I didn't work out the user-visible consequences of that logic's new
absence here, but I bet on twophase recovery breakage. Similar problem here
(up to two epochs are acceptable, not just one):
+ /* Discard files from past epochs */
+ if (EpochFromFullTransactionId(fxid) <
EpochFromFullTransactionId(nextXid))
I wrote the attached half-baked patch to fix those, but I tend to think it's
better to use FullTransactionId as many places as possible in twophase.c.
(We'd still need to convert XIDs that we read from xl_xact_prepare records,
along the lines of XLogRecGetFullXid().) How do you see it?
Author: Noah Misch <[email protected]>
Commit: Noah Misch <[email protected]>
Fix twophase.c XID epoch tracking
Half the time after epoch 0, allowable XIDs span two epochs. This would
have no user-visible consequences during epoch 0, but I expect
(unconfirmed) twophase breakage during other epochs.
FIXME likely rework this in favor of broader fulltransaction use in
twophase.c
diff --git a/src/backend/access/transam/twophase.c
b/src/backend/access/transam/twophase.c
index a3190dc..7a16293 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -926,24 +926,6 @@ TwoPhaseGetDummyProc(TransactionId xid, bool lock_held)
/* State file support
*/
/************************************************************************/
-/*
- * Compute FullTransactionId for the given TransactionId, using the current
- * epoch.
- */
-static inline FullTransactionId
-FullTransactionIdFromCurrentEpoch(TransactionId xid)
-{
- FullTransactionId fxid;
- FullTransactionId nextFullXid;
- uint32 epoch;
-
- nextFullXid = ReadNextFullTransactionId();
- epoch = EpochFromFullTransactionId(nextFullXid);
-
- fxid = FullTransactionIdFromEpochAndXid(epoch, xid);
- return fxid;
-}
-
static inline int
TwoPhaseFilePath(char *path, FullTransactionId fxid)
{
@@ -1283,7 +1265,7 @@ RegisterTwoPhaseRecord(TwoPhaseRmgrId rmid, uint16 info,
* contents of the file, issuing an error when finding corrupted data. If
* missing_ok is true, which indicates that missing files can be safely
* ignored, then return NULL. This state can be reached when doing recovery
- * after discarding two-phase files from other epochs.
+ * after discarding two-phase files from frozen epochs.
*/
static char *
ReadTwoPhaseFile(TransactionId xid, bool missing_ok)
@@ -1299,7 +1281,7 @@ ReadTwoPhaseFile(TransactionId xid, bool missing_ok)
int r;
FullTransactionId fxid;
- fxid = FullTransactionIdFromCurrentEpoch(xid);
+ fxid = FullTransactionIdFromAllowableAt(ReadNextFullTransactionId(),
xid);
TwoPhaseFilePath(path, fxid);
fd = OpenTransientFile(path, O_RDONLY | PG_BINARY);
@@ -1664,15 +1646,13 @@ FinishPreparedTransaction(const char *gid, bool
isCommit)
/* Count the prepared xact as committed or aborted */
AtEOXact_PgStat(isCommit, false);
- /*
- * And now we can clean up any files we may have left. These should be
- * from the current epoch.
- */
+ /* And now we can clean up any files we may have left. */
if (ondisk)
{
FullTransactionId fxid;
- fxid = FullTransactionIdFromCurrentEpoch(xid);
+ fxid =
FullTransactionIdFromAllowableAt(ReadNextFullTransactionId(),
+
xid);
RemoveTwoPhaseFile(fxid, true);
}
@@ -1749,8 +1729,7 @@ RecreateTwoPhaseFile(TransactionId xid, void *content,
int len)
COMP_CRC32C(statefile_crc, content, len);
FIN_CRC32C(statefile_crc);
- /* Use current epoch */
- fxid = FullTransactionIdFromCurrentEpoch(xid);
+ fxid = FullTransactionIdFromAllowableAt(ReadNextFullTransactionId(),
xid);
TwoPhaseFilePath(path, fxid);
fd = OpenTransientFile(path,
@@ -1900,7 +1879,7 @@ CheckPointTwoPhase(XLogRecPtr redo_horizon)
* This is called once at the beginning of recovery, saving any extra
* lookups in the future. Two-phase files that are newer than the
* minimum XID horizon are discarded on the way. Two-phase files with
- * an epoch older or newer than the current checkpoint's record epoch
+ * an epoch frozen relative to the current checkpoint's record epoch
* are also discarded.
*/
void
@@ -1971,7 +1950,6 @@ PrescanPreparedTransactions(TransactionId **xids_p, int
*nxids_p)
TransactionId origNextXid = XidFromFullTransactionId(nextXid);
TransactionId result = origNextXid;
TransactionId *xids = NULL;
- uint32 epoch = EpochFromFullTransactionId(nextXid);
int nxids = 0;
int allocsize = 0;
int i;
@@ -1988,11 +1966,7 @@ PrescanPreparedTransactions(TransactionId **xids_p, int
*nxids_p)
xid = gxact->xid;
- /*
- * All two-phase files with past and future epoch in
pg_twophase are
- * gone at this point, so we're OK to rely on only the current
epoch.
- */
- fxid = FullTransactionIdFromEpochAndXid(epoch, xid);
+ fxid = FullTransactionIdFromAllowableAt(nextXid, xid);
buf = ProcessTwoPhaseBuffer(fxid,
gxact->prepare_start_lsn,
gxact->ondisk, false, true);
@@ -2055,12 +2029,9 @@ void
StandbyRecoverPreparedTransactions(void)
{
int i;
- uint32 epoch;
FullTransactionId nextFullXid;
- /* get current epoch */
nextFullXid = ReadNextFullTransactionId();
- epoch = EpochFromFullTransactionId(nextFullXid);
LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE);
for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
@@ -2074,11 +2045,7 @@ StandbyRecoverPreparedTransactions(void)
xid = gxact->xid;
- /*
- * At this stage, we're OK to work with the current epoch as
all past
- * and future files have been already discarded.
- */
- fxid = FullTransactionIdFromEpochAndXid(epoch, xid);
+ fxid = FullTransactionIdFromAllowableAt(nextFullXid, xid);
buf = ProcessTwoPhaseBuffer(fxid,
gxact->prepare_start_lsn,
gxact->ondisk, true, false);
@@ -2108,12 +2075,9 @@ void
RecoverPreparedTransactions(void)
{
int i;
- uint32 epoch;
FullTransactionId nextFullXid;
- /* get current epoch */
nextFullXid = ReadNextFullTransactionId();
- epoch = EpochFromFullTransactionId(nextFullXid);
LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE);
for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
@@ -2127,10 +2091,6 @@ RecoverPreparedTransactions(void)
TransactionId *subxids;
const char *gid;
- /*
- * At this stage, we're OK to work with the current epoch as
all past
- * and future files have been already discarded.
- */
xid = gxact->xid;
/*
@@ -2142,7 +2102,7 @@ RecoverPreparedTransactions(void)
* SubTransSetParent has been set before, if the prepared
transaction
* generated xid assignment records.
*/
- fxid = FullTransactionIdFromEpochAndXid(epoch, xid);
+ fxid = FullTransactionIdFromAllowableAt(nextFullXid, xid);
buf = ProcessTwoPhaseBuffer(fxid,
gxact->prepare_start_lsn,
gxact->ondisk, true, false);
@@ -2260,8 +2220,9 @@ ProcessTwoPhaseBuffer(FullTransactionId fxid,
return NULL;
}
- /* Discard files from past epochs */
- if (EpochFromFullTransactionId(fxid) <
EpochFromFullTransactionId(nextXid))
+ /* Discard files from frozen epochs */
+ if (EpochFromFullTransactionId(fxid) + 1 <
+ EpochFromFullTransactionId(nextXid))
{
if (fromdisk)
{
@@ -2576,8 +2537,8 @@ PrepareRedoAdd(char *buf, XLogRecPtr start_lsn,
char path[MAXPGPATH];
FullTransactionId fxid;
- /* Use current epoch */
- fxid = FullTransactionIdFromCurrentEpoch(hdr->xid);
+ fxid =
FullTransactionIdFromAllowableAt(ReadNextFullTransactionId(),
+
hdr->xid);
TwoPhaseFilePath(path, fxid);
if (access(path, F_OK) == 0)
@@ -2676,10 +2637,8 @@ PrepareRedoRemove(TransactionId xid, bool giveWarning)
{
FullTransactionId fxid;
- /*
- * We should deal with a file at the current epoch here.
- */
- fxid = FullTransactionIdFromCurrentEpoch(xid);
+ fxid =
FullTransactionIdFromAllowableAt(ReadNextFullTransactionId(),
+
xid);
RemoveTwoPhaseFile(fxid, giveWarning);
}
RemoveGXact(gxact);
diff --git a/src/backend/access/transam/xlogreader.c
b/src/backend/access/transam/xlogreader.c
index 3596af0..91b6a91 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -2166,28 +2166,14 @@ RestoreBlockImage(XLogReaderState *record, uint8
block_id, char *page)
FullTransactionId
XLogRecGetFullXid(XLogReaderState *record)
{
- TransactionId xid,
- next_xid;
- uint32 epoch;
-
/*
* This function is only safe during replay, because it depends on the
* replay state. See AdvanceNextFullTransactionIdPastXid() for more.
*/
Assert(AmStartupProcess() || !IsUnderPostmaster);
- xid = XLogRecGetXid(record);
- next_xid = XidFromFullTransactionId(TransamVariables->nextXid);
- epoch = EpochFromFullTransactionId(TransamVariables->nextXid);
-
- /*
- * If xid is numerically greater than next_xid, it has to be from the
last
- * epoch.
- */
- if (unlikely(xid > next_xid))
- --epoch;
-
- return FullTransactionIdFromEpochAndXid(epoch, xid);
+ return FullTransactionIdFromAllowableAt(TransamVariables->nextXid,
+
XLogRecGetXid(record));
}
#endif
diff --git a/src/backend/utils/adt/xid8funcs.c
b/src/backend/utils/adt/xid8funcs.c
index 4736755..b173956 100644
--- a/src/backend/utils/adt/xid8funcs.c
+++ b/src/backend/utils/adt/xid8funcs.c
@@ -155,35 +155,6 @@ TransactionIdInRecentPast(FullTransactionId fxid,
TransactionId *extracted_xid)
}
/*
- * Convert a TransactionId obtained from a snapshot held by the caller to a
- * FullTransactionId. Use next_fxid as a reference FullTransactionId, so that
- * we can compute the high order bits. It must have been obtained by the
- * caller with ReadNextFullTransactionId() after the snapshot was created.
- */
-static FullTransactionId
-widen_snapshot_xid(TransactionId xid, FullTransactionId next_fxid)
-{
- TransactionId next_xid = XidFromFullTransactionId(next_fxid);
- uint32 epoch = EpochFromFullTransactionId(next_fxid);
-
- /* Special transaction ID. */
- if (!TransactionIdIsNormal(xid))
- return FullTransactionIdFromEpochAndXid(0, xid);
-
- /*
- * The 64 bit result must be <= next_fxid, since next_fxid hadn't been
- * issued yet when the snapshot was created. Every TransactionId in the
- * snapshot must therefore be from the same epoch as next_fxid, or the
- * epoch before. We know this because next_fxid is never allow to get
- * more than one epoch ahead of the TransactionIds in any snapshot.
- */
- if (xid > next_xid)
- epoch--;
-
- return FullTransactionIdFromEpochAndXid(epoch, xid);
-}
-
-/*
* txid comparator for qsort/bsearch
*/
static int
@@ -420,12 +391,18 @@ pg_current_snapshot(PG_FUNCTION_ARGS)
nxip = cur->xcnt;
snap = palloc(PG_SNAPSHOT_SIZE(nxip));
- /* fill */
- snap->xmin = widen_snapshot_xid(cur->xmin, next_fxid);
- snap->xmax = widen_snapshot_xid(cur->xmax, next_fxid);
+ /*
+ * Fill. This is the current backend's active snapshot, so MyProc->xmin
+ * is <= all these XIDs. As long as that remains so, oldestXid can't
+ * advance past any of these XIDs. Hence, these XIDs remain allowable
+ * relative to next_fxid.
+ */
+ snap->xmin = FullTransactionIdFromAllowableAt(next_fxid, cur->xmin);
+ snap->xmax = FullTransactionIdFromAllowableAt(next_fxid, cur->xmax);
snap->nxip = nxip;
for (i = 0; i < nxip; i++)
- snap->xip[i] = widen_snapshot_xid(cur->xip[i], next_fxid);
+ snap->xip[i] =
+ FullTransactionIdFromAllowableAt(next_fxid,
cur->xip[i]);
/*
* We want them guaranteed to be in ascending order. This also removes
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 0cab865..48fce16 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -77,6 +77,35 @@ FullTransactionIdFromEpochAndXid(uint32 epoch, TransactionId
xid)
return result;
}
+/*
+ * Compute FullTransactionId for the given TransactionId, assuming xid was
+ * between [oldestXid, nextXid] at the time when TransamVariables->nextXid was
+ * nextFullXid.
+ */
+static inline FullTransactionId
+FullTransactionIdFromAllowableAt(FullTransactionId nextFullXid,
+ TransactionId
xid)
+{
+ uint32 epoch;
+
+ /* Special transaction ID. */
+ if (!TransactionIdIsNormal(xid))
+ return FullTransactionIdFromEpochAndXid(0, xid);
+
+ /*
+ * The 64 bit result must be <= nextFullXid, since nextFullXid hadn't
been
+ * issued yet when xid was in the past. The xid must therefore be from
+ * the epoch of nextFullXid or the epoch before. We know this because
we
+ * must remove (by freezing) an XID before assigning the XID half an
epoch
+ * ahead of it.
+ */
+ epoch = EpochFromFullTransactionId(nextFullXid);
+ if (xid > XidFromFullTransactionId(nextFullXid))
+ epoch--;
+
+ return FullTransactionIdFromEpochAndXid(epoch, xid);
+}
+
static inline FullTransactionId
FullTransactionIdFromU64(uint64 value)
{