On 06/08/2019 07:20, Thomas Munro wrote:
On Tue, Aug 6, 2019 at 4:35 AM Andres Freund <and...@anarazel.de> wrote:
On 2019-08-05 20:58:05 +1200, Thomas Munro wrote:
1.  Commit dafaa3efb75 "Implement genuine serializable isolation
level." (2011) locked the root tuple, because it set t_self to *tid.
Possibly due to confusion about the effect of the preceding line
ItemPointerSetOffsetNumber(tid, offnum).

2.  Commit commit 81fbbfe3352 "Fix bugs in SSI tuple locking." (2013)
fixed that by adding ItemPointerSetOffsetNumber(&heapTuple->t_self,
offnum).

Hm. It's not at all sure that it's ok to report the non-root tuple tid
here. I.e. I'm fairly sure there was a reason to not just set it to the
actual tid. I think I might have written that up on the list at some
point. Let me dig in memory and list. Obviously possible that that was
also obsoleted by parallel changes.

Adding Heikki and Kevin.

I haven't found your earlier discussion about that yet, but would be
keen to read it if you can find it.   I wondered if your argument
might have had something to do with heap pruning, but I can't find a
problem there.  It's not as though the TID of any visible tuples
change, it's just that some dead stuff goes away and the root line
pointer is changed to LP_REDIRECT if it wasn't already.

As for the argument for locking the tuple we emit rather than the HOT
root, I think the questions are: (1) How exactly do we get away with
locking only one version in a regular non-HOT update chain?  (2) Is it
OK to do that with a HOT root?

The answer to the first question is given in README-SSI[1].
Unfortunately it doesn't discuss HOT directly, but I suspect the
answer is no, HOT is not special here.  By my reading, it relies on
the version you lock being the version visible to your snapshot, which
is important because later updates have to touch that tuple to write
the next version.  That doesn't apply to some arbitrarily older tuple
that happens to be a HOT root.  Concretely, heap_update() does
CheckForSerializableConflictIn(relation, &oldtup, buffer), which is
only going to produce a rw conflict if T1 took an SIREAD on precisely
the version T2 locks in that path, not some arbitrarily older version
that happens to be a HOT root.  A HOT root might never be considered
again by concurrent writers, no?

Your analysis is spot on. Thanks for the clear write-up!

This must be in want of an isolation test, but I haven't yet tried to
get my head around how to write a test that would show the difference.

Indeed.

One practical problem is that the only way to reach
PredicateLockTuple() is from an index scan, and the index scan locks
the index page (or the whole index, depending on
rd_indam->ampredlocks).  So I think if you want to see a serialization
anomaly you'll need multiple indexes (so that index page locks don't
hide the problem), a long enough HOT chain and then probably several
transactions to be able to miss a cycle that should be picked up by
the logic in [1].  I'm out of steam for this problem today though.

I had some steam, and wrote a spec that reproduces this bug. It wasn't actually that hard to reproduce, fortunately. Or unfortunately; people might well be hitting it in production. I used the "freezetest.spec" from the 2013 thread as the starting point, and added one UPDATE to the initialization, so that the test starts with an already HOT-updated tuple. It should throw a serialization error, but on current master, it does not. After applying your fix.txt, it does.

Your fix.txt seems correct. For clarity, I'd prefer moving things around a bit, though, so that the t_self is set earlier in the function, at the same place where the other HeapTuple fields are set. And set blkno and offnum together, in one ItemPointerSet call. With that, I'm not sure we need such a verbose comment explaining why t_self needs to be updated but I kept it for now.

Attached is a patch that contains your fix.txt with the changes for clarity mentioned above, and an isolationtester test case.

PS. Because heap_hot_search_buffer() now always sets heapTuple->t_self to the returned tuple version, updating *tid is redundant. And the call in heapam_index_fetch_tuple() wouldn't need to do "bslot->base.tupdata.t_self = *tid".

- Heikki
>From 23d07e6d5b259e1fd2fe7694cde51212920fbb3a Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakan...@iki.fi>
Date: Tue, 6 Aug 2019 12:14:14 +0300
Subject: [PATCH 1/1] Fix predicate-locking of HOT updated rows.

In serializable mode, heap_hot_search_buffer() incorrectly acquired a
predicate lock on the root tuple, not the returned tuple that satisfied
the visibility checks. As explained in README-SSI, the predicate lock does
not need to be copied or extended to other tuple versions, but for that to
work, the correct, visible, tuple version must be locked in the first
place.

The original SSI commit had this bug in it, but it was fixed back in 2013,
in commit 81fbbfe335. But unfortunately, it was reintroduced a few months
later in commit b89e151054. Wising up from that, add a regression test
to cover this, so that it doesn't get reintroduced again. Also, move the
code that sets 't_self', so that it happens at the same time that the
other HeapTuple fields are set, to make it more clear that all the code in
the loop operate on the "current" tuple in the chain, not the root tuple.

Bug spotted by Andres Freund, analysis and original fix by Thomas Munro,
test case and some additional changes to the fix by Heikki Linnakangas.

Discussion: https://www.postgresql.org/message-id/20190731210630.nqhszuktygwftjty%40alap3.anarazel.de
---
 src/backend/access/heap/heapam.c              | 27 ++++++--------
 .../expected/predicate-lock-hot-tuple.out     | 20 ++++++++++
 src/test/isolation/isolation_schedule         |  1 +
 .../specs/predicate-lock-hot-tuple.spec       | 37 +++++++++++++++++++
 4 files changed, 69 insertions(+), 16 deletions(-)
 create mode 100644 src/test/isolation/expected/predicate-lock-hot-tuple.out
 create mode 100644 src/test/isolation/specs/predicate-lock-hot-tuple.spec

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index e33f019939..e81a390ec6 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -1529,6 +1529,7 @@ heap_hot_search_buffer(ItemPointer tid, Relation relation, Buffer buffer,
 {
 	Page		dp = (Page) BufferGetPage(buffer);
 	TransactionId prev_xmax = InvalidTransactionId;
+	BlockNumber blkno;
 	OffsetNumber offnum;
 	bool		at_chain_start;
 	bool		valid;
@@ -1538,14 +1539,13 @@ heap_hot_search_buffer(ItemPointer tid, Relation relation, Buffer buffer,
 	if (all_dead)
 		*all_dead = first_call;
 
-	Assert(TransactionIdIsValid(RecentGlobalXmin));
-
-	Assert(ItemPointerGetBlockNumber(tid) == BufferGetBlockNumber(buffer));
+	blkno = ItemPointerGetBlockNumber(tid);
 	offnum = ItemPointerGetOffsetNumber(tid);
 	at_chain_start = first_call;
 	skip = !first_call;
 
-	heapTuple->t_self = *tid;
+	Assert(TransactionIdIsValid(RecentGlobalXmin));
+	Assert(BufferGetBlockNumber(buffer) == blkno);
 
 	/* Scan through possible multiple members of HOT-chain */
 	for (;;)
@@ -1573,10 +1573,16 @@ heap_hot_search_buffer(ItemPointer tid, Relation relation, Buffer buffer,
 			break;
 		}
 
+		/*
+		 * Update heapTuple to point to the element of the HOT chain we're
+		 * currently investigating. Having t_self set correctly is important
+		 * because the SSI checks and the *Satisfies routine for historical
+		 * MVCC snapshots need the correct tid to decide about the visibility.
+		 */
 		heapTuple->t_data = (HeapTupleHeader) PageGetItem(dp, lp);
 		heapTuple->t_len = ItemIdGetLength(lp);
 		heapTuple->t_tableOid = RelationGetRelid(relation);
-		ItemPointerSetOffsetNumber(&heapTuple->t_self, offnum);
+		ItemPointerSet(&heapTuple->t_self, blkno, offnum);
 
 		/*
 		 * Shouldn't see a HEAP_ONLY tuple at chain start.
@@ -1602,21 +1608,10 @@ heap_hot_search_buffer(ItemPointer tid, Relation relation, Buffer buffer,
 		 */
 		if (!skip)
 		{
-			/*
-			 * For the benefit of logical decoding, have t_self point at the
-			 * element of the HOT chain we're currently investigating instead
-			 * of the root tuple of the HOT chain. This is important because
-			 * the *Satisfies routine for historical mvcc snapshots needs the
-			 * correct tid to decide about the visibility in some cases.
-			 */
-			ItemPointerSet(&(heapTuple->t_self), BufferGetBlockNumber(buffer), offnum);
-
 			/* If it's visible per the snapshot, we must return it */
 			valid = HeapTupleSatisfiesVisibility(heapTuple, snapshot, buffer);
 			CheckForSerializableConflictOut(valid, relation, heapTuple,
 											buffer, snapshot);
-			/* reset to original, non-redirected, tid */
-			heapTuple->t_self = *tid;
 
 			if (valid)
 			{
diff --git a/src/test/isolation/expected/predicate-lock-hot-tuple.out b/src/test/isolation/expected/predicate-lock-hot-tuple.out
new file mode 100644
index 0000000000..d1c69bbbd0
--- /dev/null
+++ b/src/test/isolation/expected/predicate-lock-hot-tuple.out
@@ -0,0 +1,20 @@
+Parsed test spec with 2 sessions
+
+starting permutation: b1 b2 r1 r2 w1 w2 c1 c2
+step b1: BEGIN ISOLATION LEVEL SERIALIZABLE;
+step b2: BEGIN ISOLATION LEVEL SERIALIZABLE;
+step r1: SELECT * FROM test WHERE i IN (5, 7)
+i              t              
+
+5              apple          
+7              pear_hot_updated
+step r2: SELECT * FROM test WHERE i IN (5, 7)
+i              t              
+
+5              apple          
+7              pear_hot_updated
+step w1: UPDATE test SET t = 'pear_xact1' WHERE i = 7
+step w2: UPDATE test SET t = 'apple_xact2' WHERE i = 5
+step c1: COMMIT;
+step c2: COMMIT;
+ERROR:  could not serialize access due to read/write dependencies among transactions
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 009c2ec54b..69ae227953 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -17,6 +17,7 @@ test: partial-index
 test: two-ids
 test: multiple-row-versions
 test: index-only-scan
+test: predicate-lock-hot-tuple
 test: deadlock-simple
 test: deadlock-hard
 test: deadlock-soft
diff --git a/src/test/isolation/specs/predicate-lock-hot-tuple.spec b/src/test/isolation/specs/predicate-lock-hot-tuple.spec
new file mode 100644
index 0000000000..d16fb60533
--- /dev/null
+++ b/src/test/isolation/specs/predicate-lock-hot-tuple.spec
@@ -0,0 +1,37 @@
+# Test predicate locks on HOT updated tuples.
+#
+# This test has two serializable transactions. Both select two rows
+# from the table, and then update one of them.
+# If these were serialized (run one at a time), the transaction that
+# runs later would see one of the rows to be updated.
+#
+# Any overlap between the transactions must cause a serialization failure.
+# We used to have a bug in predicate locking HOT updated tuples, which
+# caused the conflict to be missed, if the row was HOT updated.
+
+setup
+{
+  CREATE TABLE test (i int PRIMARY KEY, t text);
+  INSERT INTO test VALUES (5, 'apple'), (7, 'pear'), (11, 'banana');
+  -- HOT-update 'pear' row.
+  UPDATE test SET t = 'pear_hot_updated' WHERE i = 7;
+}
+
+teardown
+{
+  DROP TABLE test;
+}
+
+session "s1"
+step "b1" { BEGIN ISOLATION LEVEL SERIALIZABLE; }
+step "r1" { SELECT * FROM test WHERE i IN (5, 7) }
+step "w1" { UPDATE test SET t = 'pear_xact1' WHERE i = 7 }
+step "c1" { COMMIT; }
+
+session "s2"
+step "b2" { BEGIN ISOLATION LEVEL SERIALIZABLE; }
+step "r2" { SELECT * FROM test WHERE i IN (5, 7) }
+step "w2" { UPDATE test SET t = 'apple_xact2' WHERE i = 5 }
+step "c2" { COMMIT; }
+
+permutation "b1" "b2" "r1" "r2" "w1" "w2" "c1" "c2"
-- 
2.20.1

Reply via email to