I think you've done a stellar job of identifying what the actual problem was. I like the new (simpler) coding of that portion of HeapTupleSatisfiesVacuum.
freeze-the-dead is not listed in isolation_schedule; an easy fix. I confirm that the test crashes with an assertion failure without the code fix, and that it doesn't with it. I think the comparison to OldestXmin should be reversed: if (!TransactionIdPrecedes(xmax, OldestXmin)) return HEAPTUPLE_RECENTLY_DEAD; return HEAPTUPLE_DEAD; This way, an xmax that has exactly the OldestXmin value will return RECENTLY_DEAD rather DEAD, which seems reasonable to me (since OldestXmin value itself is supposed to be still possibly visible to somebody). Also, this way it is consistent with the other comparison to OldestXmin at the bottom of the function. There is no reason for the "else" or the extra braces. Put together, I propose the attached delta for 0001. Your commit message does a poor job of acknowledging prior work on diagnosing the problem starting from Dan's initial test case and patch. I haven't looked at your 0002 yet. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c index 8be2980116..aab03835d1 100644 --- a/src/backend/utils/time/tqual.c +++ b/src/backend/utils/time/tqual.c @@ -1324,25 +1324,23 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin, else if (TransactionIdDidCommit(xmax)) { /* - * The multixact might still be running due to lockers. If the - * updater is below the horizon we have to return DEAD regardless - * - otherwise we could end up with a tuple where the updater has - * to be removed due to the horizon, but is not pruned away. It's - * not a problem to prune that tuple because all the lockers will - * also be present in the newer tuple version. + * The multixact might still be running due to lockers. If the + * updater is below the xid horizon, we have to return DEAD + * regardless -- otherwise we could end up with a tuple where the + * updater has to be removed due to the horizon, but is not pruned + * away. It's not a problem to prune that tuple, because any + * remaining lockers will also be present in newer tuple versions. */ - if (TransactionIdPrecedes(xmax, OldestXmin)) - { - return HEAPTUPLE_DEAD; - } - else + if (!TransactionIdPrecedes(xmax, OldestXmin)) return HEAPTUPLE_RECENTLY_DEAD; + + return HEAPTUPLE_DEAD; } else if (!MultiXactIdIsRunning(HeapTupleHeaderGetRawXmax(tuple), false)) { /* * Not in Progress, Not Committed, so either Aborted or crashed. - * Remove the Xmax. + * Mark the Xmax as invalid. */ SetHintBits(tuple, buffer, HEAP_XMAX_INVALID, InvalidTransactionId); } diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule index e41b9164cd..eb566ebb6c 100644 --- a/src/test/isolation/isolation_schedule +++ b/src/test/isolation/isolation_schedule @@ -44,6 +44,7 @@ test: update-locked-tuple test: propagate-lock-delete test: tuplelock-conflict test: tuplelock-update +test: freeze-the-dead test: nowait test: nowait-2 test: nowait-3