Hi, While self reviewing a patch I'm about to send I changed the assertion in index_getnext_tid from Assert(TransactionIdIsValid(RecentGlobalXmin)) to instead test (via an indirection) Assert(TransactionIdIsValid(MyProc->xmin))
Without ->xmin being set, it's not safe to do scans. And checking RecentGlobalXmin doesn't really do much. But, uh. That doesn't get very far through the regression tests. Turns out that I am to blame for that. All the way back in 9.4. For logical decoding I needed to make ScanPgRelation() use a specific type of snapshot during one corner case of logical decoding. For reasons lost to time, I didn't continue to pass NULL to systable_beginscan() in the plain case, but did an explicit GetCatalogSnapshot(RelationRelationId). Note the missing RegisterSnapshot()... That's bad because: a) If invalidation processing triggers a InvalidateCatalogSnapshot(), the contained SnapshotResetXmin() may find no other snapshot, and reset ->xmin. Which then may cause relevant row versions to be removed. b) If there's a subsequent GetCatalogSnapshot() during invalidation processing, that will GetSnapshotData() into the snapshot currently being used. The fix itself is trivial, just pass NULL for the normal case, rather than doing GetCatalogSnapshot(). But I think this points to some severe holes in relevant assertions / infrastructure: 1) Asserting that RecentGlobalXmin is set - like many routines do - isn't meaningful, because it stays set even if SnapshotResetXmin() releases the transaction's snapshot. These are fairly old assertions (d53a56687f3d). As far as I can tell these routines really should verify that a snapshot is set. 2) I think we need to reset TransactionXmin, RecentXmin whenever SnapshotResetXmin() clears xmin. While we'll set them again the next time a snapshot is acquired, the fact that they stay set seems likely to hide bugs. We also could remove TransactionXmin and instead use the pgproc/pgxact's ->xmin. I don't really see the point of having it? 3) Similarly, I think we ought to reset reset RecentGlobal[Data]Xmin at the end of the transaction or such. But I'm not clear what protects those values from being affected by wraparound in a longrunning transaction? Initially they are obviously protected against that due to the procarray - but once the oldest procarray entry releases its snapshot, the global xmin horizon can advance. That allows transactions that up to ~2 billion into the future of the current backend, whereas RecentGlobalXmin might be nearly ~2 billion transactions in the past relative to ->xmin. That might not have been a likely problem many years ago, but seems far from impossible today? I propose to commit a fix, but then also add an assertion for TransactionIdIsValid(MyPgXact->xmin) instead (or in addition) to the TransactionIdIsValid(RecentGlobalXmin) tests right now. And in master clear the various *Xmin variables whenever we reset xmin. I think in master we should also start to make RecentGlobalXmin etc FullTransactionIds. We can then convert the 32bit xids we compare with RecentGlobal* to 64bit xids (which is safe, because live xids have to be within [oldestXid, nextXid)). I have that as part of another patch anyway... Greetings, Andres Freund