I think you've done a stellar job of identifying what the actual problem
was.  I like the new (simpler) coding of that portion of
HeapTupleSatisfiesVacuum.

freeze-the-dead is not listed in isolation_schedule; an easy fix.
I confirm that the test crashes with an assertion failure without the
code fix, and that it doesn't with it.

I think the comparison to OldestXmin should be reversed:

                        if (!TransactionIdPrecedes(xmax, OldestXmin))
                                return HEAPTUPLE_RECENTLY_DEAD;

                        return HEAPTUPLE_DEAD;

This way, an xmax that has exactly the OldestXmin value will return
RECENTLY_DEAD rather DEAD, which seems reasonable to me (since
OldestXmin value itself is supposed to be still possibly visible to
somebody).  Also, this way it is consistent with the other comparison to
OldestXmin at the bottom of the function.  There is no reason for the
"else" or the extra braces.

Put together, I propose the attached delta for 0001.

Your commit message does a poor job of acknowledging prior work on
diagnosing the problem starting from Dan's initial test case and patch.

I haven't looked at your 0002 yet.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c
index 8be2980116..aab03835d1 100644
--- a/src/backend/utils/time/tqual.c
+++ b/src/backend/utils/time/tqual.c
@@ -1324,25 +1324,23 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId 
OldestXmin,
                else if (TransactionIdDidCommit(xmax))
                {
                        /*
-                        * The multixact might still be running due to lockers. 
If the
-                        * updater is below the horizon we have to return DEAD 
regardless
-                        * - otherwise we could end up with a tuple where the 
updater has
-                        * to be removed due to the horizon, but is not pruned 
away. It's
-                        * not a problem to prune that tuple because all the 
lockers will
-                        * also be present in the newer tuple version.
+                        * The multixact might still be running due to lockers. 
 If the
+                        * updater is below the xid horizon, we have to return 
DEAD
+                        * regardless -- otherwise we could end up with a tuple 
where the
+                        * updater has to be removed due to the horizon, but is 
not pruned
+                        * away.  It's not a problem to prune that tuple, 
because any
+                        * remaining lockers will also be present in newer 
tuple versions.
                         */
-                       if (TransactionIdPrecedes(xmax, OldestXmin))
-                       {
-                               return HEAPTUPLE_DEAD;
-                       }
-                       else
+                       if (!TransactionIdPrecedes(xmax, OldestXmin))
                                return HEAPTUPLE_RECENTLY_DEAD;
+
+                       return HEAPTUPLE_DEAD;
                }
                else if 
(!MultiXactIdIsRunning(HeapTupleHeaderGetRawXmax(tuple), false))
                {
                        /*
                         * Not in Progress, Not Committed, so either Aborted or 
crashed.
-                        * Remove the Xmax.
+                        * Mark the Xmax as invalid.
                         */
                        SetHintBits(tuple, buffer, HEAP_XMAX_INVALID, 
InvalidTransactionId);
                }
diff --git a/src/test/isolation/isolation_schedule 
b/src/test/isolation/isolation_schedule
index e41b9164cd..eb566ebb6c 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -44,6 +44,7 @@ test: update-locked-tuple
 test: propagate-lock-delete
 test: tuplelock-conflict
 test: tuplelock-update
+test: freeze-the-dead
 test: nowait
 test: nowait-2
 test: nowait-3

Reply via email to