Hi, I am trying to change the snapshot too old infrastructure so it cooperates with my snapshot scalability patch. While trying to understand the code sufficiently, I think I found a fairly serious issue:
To map the time-based old_snapshot_threshold to an xid that can be used as a cutoff for heap_page_prune(), we maintain a ringbuffer of old_snapshot_threshold + 10 entries in oldSnapshotControlData->xid_by_minute[]. TransactionIdLimitedForOldSnapshots() uses that to (at least that's the goal) increase the horizon used for pruning. The problem is that there's no protection again the xids in the ringbuffer getting old enough to wrap around. Given that practical uses of old_snapshot_threshold are likely to be several hours to several days, that's not particularly hard to hit. That then has the consequence that we can use an xid that's either from "from the future" (if bigger than the current xid), or more recent than appropriate (if it wrapped far enough to be below nextxid, but not yet older than OldestXmin) as the OldestXmin argument to heap_page_prune(). Which in turn means that we can end up pruning much more recently removed rows than intended. While that'd be bad on its own, the big problem is that we won't detect that case on access, in contrast to the way old_snapshot_threshold is intended to work. The reason for that is detecting these errors happens on the basis of timestamps - which obviously do not wrap around. Greetings, Andres Freund