A customer of ours reported a problem in 9.3.14 while inserting tuples in a table with a foreign key, with many concurrent transactions doing the same: after a few insertions worked sucessfully, a later one would return failure indicating that the primary key value was not present in the referenced table. It worked fine for them on 9.3.4.
After some research, we determined that the problem disappeared if commit this commit was reverted: Author: Alvaro Herrera <alvhe...@alvh.no-ip.org> Branch: master Release: REL9_6_BR [533e9c6b0] 2016-07-15 14:17:20 -0400 Branch: REL9_5_STABLE Release: REL9_5_4 [649dd1b58] 2016-07-15 14:17:20 -0400 Branch: REL9_4_STABLE Release: REL9_4_9 [166873dd0] 2016-07-15 14:17:20 -0400 Branch: REL9_3_STABLE Release: REL9_3_14 [6c243f90a] 2016-07-15 14:17:20 -0400 Avoid serializability errors when locking a tuple with a committed update I spent some time writing an isolationtester spec to reproduce the problem. It turned out that this required six concurrent sessions in order for the problem to show up at all, but once I had that, figuring out what was going on was simple: a transaction wants to lock the updated version of some tuple, and it does so; and some other transaction is also locking the same tuple concurrently in a compatible way. So both are okay to proceed concurrently. The problem is that if one of them detects that anything changed in the process of doing this (such as the other session updating the multixact to include itself, both having compatible lock modes), it loops back to ensure xmax/ infomask are still sane; but heap_lock_updated_tuple_rec is not prepared to deal with the situation of "the current transaction has the lock already", so it returns a failure and the tuple is returned as "not visible" causing the described problem. I *think* that the problem did not show up before the commit cited above because the bug fixed by that commit reduced concurrency, effectively masking this bug. In assertion-enabled builds, this happens instead (about 2 out of 3 times I run the script in my laptop): TRAP: FailedAssertion(«!(TransactionIdIsCurrentTransactionId(((!((tup)->t_infomask & 0x0800) && ((tup)->t_infomask & 0x1000) && !((tup)->t_infomask & 0x0080)) ? HeapTupleGetUpdateXid(tup) : ( (tup)->t_choice.t_heap.t_xmax) )))», Archivo: «/pgsql/source/REL9_3_STABLE/src/backend/utils/time/combocid.c», Línea: 122) with this backtrace: (gdb) bt #0 0x00007f651cd66067 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 #1 0x00007f651cd67448 in __GI_abort () at abort.c:89 #2 0x0000000000754ac1 in ExceptionalCondition ( conditionName=conditionName@entry=0x900d68 "!(TransactionIdIsCurrentTransactionId(( (!((tup)->t_infomask & 0x0800) && ((tup)->t_infomask & 0x1000) && ! +((tup)->t_infomask & 0x0080)) ? HeapTupleGetUpdateXid(tup) : ( (tup)->t_choice.t_heap.t_xmax "..., errorType=errorType@entry=0x78cdb4 "FailedAssertion", fileName=fileName@entry=0x900ca8 "/pgsql/source/REL9_3_STABLE/src/backend/utils/time/combocid.c", lineNumber=lineNumber@entry=122) at /pgsql/source/REL9_3_STABLE/src/backend/utils/error/assert.c:54 #3 0x0000000000781cf8 in HeapTupleHeaderGetCmax (tup=0x7f6513f289d0) at /pgsql/source/REL9_3_STABLE/src/backend/utils/time/combocid.c:122 #4 0x0000000000495911 in heap_lock_tuple (relation=0x7f651e0d9138, tuple=0x7ffe928a7de0, cid=0, mode=LockTupleKeyShare, nowait=0 '\000', follow_updates=1 '\001', buffer=0x7ffe928a7dcc, hufd=0x7ffe928a7dd0) at /pgsql/source/REL9_3_STABLE/src/backend/access/heap/heapam.c:4439 #5 0x00000000005b2f74 in ExecLockRows (node=0x290f070) at /pgsql/source/REL9_3_STABLE/src/backend/executor/nodeLockRows.c:134 The attached patch fixes the problem. When locking some old tuple version of the chain, if we detect that we already hold that lock (test_lockmode_for_conflict returns HeapTupleSelfUpdated), do not try to lock it again but instead skip ahead to the next version. This fixes the synthetic case in my isolationtester as well as our customer's production case. I attach the isolationtester spec file. In its present form it's rather noisy, because I added calls to report the XIDs for each transaction, so that I could exactly replicate the WAL sequence that I obtained from the customer in order to reproduce the issue. That would have to be removed in order for the test to be included in the repository, of course. This spec doesn't work at all with 9.3's isolationtester, because back then isolationtester had the limitation that only one session could be waiting; to verify this bug, I backpatched 38f8bdcac498 ("Modify the isolation tester so that multiple sessions can wait.") so that it'd work at all. This means that we cannot include the test in branches older than 9.6. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From d065f5e91c8e2c23debf09252e5f7eb94f99dff4 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera <alvhe...@alvh.no-ip.org> Date: Tue, 20 Jun 2017 18:15:25 -0400 Subject: [PATCH] if we already hold lock, skip this tuple --- src/backend/access/heap/heapam.c | 41 ++++++++++++++++++++++++++++++++++++---- 1 file changed, 37 insertions(+), 4 deletions(-) diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index 898950d9a0..ae0b214519 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -4906,8 +4906,10 @@ l5: * with the given xid, does the current transaction need to wait, fail, or can * it continue if it wanted to acquire a lock of the given mode? "needwait" * is set to true if waiting is necessary; if it can continue, then - * HeapTupleMayBeUpdated is returned. In case of a conflict, a different - * HeapTupleSatisfiesUpdate return code is returned. + * HeapTupleMayBeUpdated is returned. If the lock is already held by the + * current transaction, return HeapTupleSelfUpdated. In case of a conflict + * with another transaction, a different HeapTupleSatisfiesUpdate return code + * is returned. * * The held status is said to be hypothetical because it might correspond to a * lock held by a single Xid, i.e. not a real MultiXactId; we express it this @@ -4930,8 +4932,9 @@ test_lockmode_for_conflict(MultiXactStatus status, TransactionId xid, if (TransactionIdIsCurrentTransactionId(xid)) { /* - * Updated by our own transaction? Just return failure. This shouldn't - * normally happen. + * The tuple has already been locked by our own transaction. This is + * very rare but can happen if multiple transactions are trying to + * lock an ancient version of the same tuple. */ return HeapTupleSelfUpdated; } @@ -5100,6 +5103,22 @@ l4: members[i].xid, mode, &needwait); + /* + * If the tuple was already locked by ourselves in a + * previous iteration of this (say heap_lock_tuple was + * forced to restart the locking loop because of a change + * in xmax), then we hold the lock already on this tuple + * version and we don't need to do anything; and this is + * not an error condition either. We just need to skip + * this tuple and continue locking the next version in the + * update chain. + */ + if (res == HeapTupleSelfUpdated) + { + pfree(members); + goto next; + } + if (needwait) { LockBuffer(buf, BUFFER_LOCK_UNLOCK); @@ -5160,6 +5179,19 @@ l4: res = test_lockmode_for_conflict(status, rawxmax, mode, &needwait); + + /* + * If the tuple was already locked by ourselves in a previous + * iteration of this (say heap_lock_tuple was forced to + * restart the locking loop because of a change in xmax), then + * we hold the lock already on this tuple version and we don't + * need to do anything; and this is not an error condition + * either. We just need to skip this tuple and continue + * locking the next version in the update chain. + */ + if (res == HeapTupleSelfUpdated) + goto next; + if (needwait) { LockBuffer(buf, BUFFER_LOCK_UNLOCK); @@ -5221,6 +5253,7 @@ l4: END_CRIT_SECTION(); +next: /* if we find the end of update chain, we're done. */ if (mytup.t_data->t_infomask & HEAP_XMAX_INVALID || ItemPointerEquals(&mytup.t_self, &mytup.t_data->t_ctid) || -- 2.11.0
setup { DROP TABLE IF EXISTS epq_lock_t; CREATE TABLE epq_lock_t (a int PRIMARY KEY, b text) WITH (autovacuum_enabled = false); ALTER TABLE epq_lock_t ALTER COLUMN b SET STORAGE PLAIN; INSERT INTO epq_lock_t SELECT a, md5(a::text) FROM generate_series(1, 260) AS s(a); DELETE FROM epq_lock_t WHERE a = 131; } teardown { DROP TABLE epq_lock_t; } session "z" step "z0" { BEGIN } step "z_report" { SELECT txid_current(), pg_create_restore_point('z') } step "z1" { SELECT a FROM epq_lock_t WHERE a = 2 FOR NO KEY UPDATE } step "z2" { UPDATE epq_lock_t SET b = 'updated by z' WHERE a = 2 } step "z3" { COMMIT } session "a" step "a1" { BEGIN } step "a2" { SELECT a FROM epq_lock_t WHERE a = 2 FOR UPDATE } step "a_report" { SELECT pg_create_restore_point('a') } step "a5_1" { SAVEPOINT foo; } step "a5" { UPDATE epq_lock_t SET b = 'updated by a' WHERE a = 2 RETURNING ctid } step "a6" { COMMIT } session "b" step "b3" { BEGIN } step "b_report" { SELECT txid_current(), pg_create_restore_point('b') } step "b4" { SELECT * FROM epq_lock_t WHERE pg_advisory_xact_lock(123132) IS NOT NULL AND a = 2 FOR KEY SHARE } step "bN" { COMMIT } session "c" step "c7" { BEGIN } step "c_report" { SELECT txid_current(), pg_create_restore_point('c') } step "cN" { SELECT * FROM epq_lock_t WHERE a = 2 FOR UPDATE; } step "cU" { SAVEPOINT foo; } step "c8" { UPDATE epq_lock_t SET b = 'updated by c' WHERE a = 2 } step "c18" { COMMIT } session "d" step "d9" { BEGIN; SELECT txid_current(), pg_create_restore_point('d'); } step "d9_2" { SELECT * FROM epq_lock_t WHERE pg_advisory_xact_lock(123133) IS NOT NULL AND a = 2 FOR KEY SHARE } step "d15" { aborts trying to lock some other tuple } session "e" step "e12" { BEGIN } step "e_report" { SELECT txid_current(), pg_create_restore_point('e') } step "e13" { SELECT * FROM epq_lock_t WHERE a = 2 FOR KEY SHARE } step "e19" { commit} session "f" step "f14" { BEGIN } step "f_report" { SELECT txid_current(), pg_create_restore_point('f') } step "f15" { UPDATE epq_lock_t SET b = 'updated by f' WHERE a = 2 } step "f19" { commit } session "g" step "g16" { BEGIN } step "g_report" { SELECT txid_current(), pg_create_restore_point('g') } step "g17" { UPDATE epq_lock_t SET b = 'updated by g' WHERE a = 2 } step "g19" { commit } session "control" step "vacuum" { VACUUM epq_lock_t; } step "curr_lsn" { SELECT pg_current_xlog_insert_location(); } step "lock_b" { SELECT pg_advisory_lock(123132) } step "unlock_b" { SELECT pg_advisory_unlock(123132) } step "lock_d" { SELECT pg_advisory_lock(123133) } step "unlock_d" { SELECT pg_advisory_unlock(123133) } permutation "vacuum" "curr_lsn" "b3" "b_report" "lock_b" "b4" "z0" "z_report" "z1" "z2" "z3" "a1" "a2" "a_report" "a5_1" "a5" "c7" "c_report" "cN" "a6" "cU" "c8" "unlock_b" "d9" "d9_2" "c18" "bN" "e12" "e_report" "e13" "f14" "f_report" "f15" "g16" "g_report" "g17" "c18" "e19" "f19" "g19" "d15" "curr_lsn"
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers