Hi all Andres and I were going over a patch yesterday and found an unexpected bug in tqual.c while attempting to trigger a hypothesized bug in that patch.
A SELECT ... FOR SHARE will incorrectly block on another open transaction that ran SELECT ... FOR SHARE and still holds the locks if it has to follow a ctid chain from the current snapshot through a committed update to a share-locked tuple. This also affects uniqueness checks in btrees, where it can cause unnecessary waiting. It's also an issue with FOR KEY UPDATE, in that it can cause an update to block when it doesn't have to. The attached test case runs under isolationtester to exersise the problem. I've tested it against 9.2, 9.3, and HEAD, but Andres looked over the code and says the underlying bug goes back to the commit of MVCC, it's just become easier to trigger. To reliably test this with isolationtester I had to horribly abuse pg_advisory_lock(...) so that I could force the first SELECT ... FOR UPDATE to acquire its snapshot before the UPDATE runs. A backtrace of the point where it's incorrectly blocked is attached. What's happening is that the test for TransactionIdIsInProgress unconditionally sets snapshot->xmax, even if xmax was only set for locking purposes. This will cause the caller to wait for the xid in xmax when it doesn't need to. It should be testing HEAP_XMAX_IS_LOCKED_ONLY and only setting snapshot->xmax if the tuple is really being deleted by an open tx. Note that HeapTupleSatisfiesDirty tests the infomask for HEAP_XMAX_IS_MULTI and handles multixacts separately, and in that case it _does_ already test HEAP_XMAX_IS_LOCKED_ONLY. When you run the attached test case it should block indefinitely. After applying the attached patch it'll return as expected. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
# SELECT ... FOR UPDATE SKIP LOCKED # # Sanity checks for SELECT ... FOR UPDATE SKIP LOCKED. setup { DROP TABLE IF EXISTS skip_locked; CREATE TABLE skip_locked ( id integer PRIMARY KEY, value integer not null ); INSERT INTO skip_locked SELECT x,x FROM generate_series(1,10) x; } teardown { DROP TABLE skip_locked; } # The session that does the SKIP LOCKED queries session "sl1" step "sl1_begin" { BEGIN ISOLATION LEVEL READ COMMITTED; } step "sl1_selectblocking" { SELECT xmin, xmax, ctid, * FROM skip_locked WHERE pg_advisory_lock(0) is not null FOR SHARE NOWAIT; } teardown { COMMIT; } # A session that's used for an UPDATE of the rows to be locked, for when we're testing ctid # chain following. session "upd" step "upd_getlock" { SELECT pg_advisory_lock(0); } step "upd_doupdate" { BEGIN ISOLATION LEVEL READ COMMITTED; UPDATE skip_locked SET value = value WHERE id % 2 = 0; COMMIT; } step "upd_releaselock" { SELECT pg_advisory_unlock(0); } # A session that acquires locks that sl1 is supposed to skip session "lk1" step "lk1_doforshare" { BEGIN ISOLATION LEVEL READ COMMITTED; SELECT id FROM skip_locked WHERE id % 2 = 0 FOR SHARE; } teardown { COMMIT; } permutation "upd_getlock" "sl1_begin" "sl1_selectblocking" "upd_doupdate" "lk1_doforshare" "upd_releaselock"
diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c new file mode 100644 index da2ce18..0c72115 *** a/src/backend/utils/time/tqual.c --- b/src/backend/utils/time/tqual.c *************** HeapTupleSatisfiesDirty(HeapTuple htup, *** 1012,1018 **** if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmax(tuple))) { ! snapshot->xmax = HeapTupleHeaderGetRawXmax(tuple); return true; } --- 1012,1019 ---- if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmax(tuple))) { ! if (!HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask)) ! snapshot->xmax = HeapTupleHeaderGetRawXmax(tuple); return true; }
#0 0x0000003e46af6937 in semop () at ../sysdeps/unix/syscall-template.S:81 #1 0x0000000000600697 in PGSemaphoreLock (sema=0x7fe7027674d0, interruptOK=interruptOK@entry=1 '\001') at pg_sema.c:415 #2 0x000000000063e5c6 in ProcSleep (locallock=locallock@entry=0x1e30560, lockMethodTable=lockMethodTable@entry=0x85e4c0 <default_lockmethod>) at proc.c:1092 #3 0x00000000006398f9 in WaitOnLock (locallock=locallock@entry=0x1e30560, owner=owner@entry=0x1ef6e28) at lock.c:1586 #4 0x000000000063abeb in LockAcquireExtended (locktag=locktag@entry=0x7fff26f0d680, lockmode=lockmode@entry=5, sessionLock=sessionLock@entry=0 '\000', dontWait=dontWait@entry=0 '\000', reportMemoryError=reportMemoryError@entry=1 '\001') at lock.c:957 #5 0x000000000063afb1 in LockAcquire (locktag=locktag@entry=0x7fff26f0d680, lockmode=lockmode@entry=5, sessionLock=sessionLock@entry=0 '\000', dontWait=dontWait@entry=0 '\000') at lock.c:662 #6 0x0000000000639350 in XactLockTableWait (xid=124785) at lmgr.c:495 #7 0x000000000057c0e8 in EvalPlanQualFetch (estate=estate@entry=0x1f13e60, relation=0x7fe70018f1d8, lockmode=lockmode@entry=0, tid=tid@entry=0x7fff26f0d7c0, priorXmax=<optimized out>) at execMain.c:1835 #8 0x000000000059063d in ExecLockRows (node=node@entry=0x1f14070) at nodeLockRows.c:147 #9 0x000000000057ce48 in ExecProcNode (node=node@entry=0x1f14070) at execProcnode.c:495 #10 0x000000000057a8e0 in ExecutePlan (dest=0x1f0f908, direction=<optimized out>, numberTuples=0, sendTuples=1 '\001', operation=CMD_SELECT, planstate=0x1f14070, estate=0x1f13e60) at execMain.c:1395 #11 standard_ExecutorRun (queryDesc=0x1e34030, direction=<optimized out>, count=0) at execMain.c:303 #12 0x000000000064eb50 in PortalRunSelect (portal=portal@entry=0x1e31000, forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807, dest=dest@entry=0x1f0f908) at pquery.c:944 #13 0x000000000064fe8b in PortalRun (portal=0x1e31000, count=9223372036854775807, isTopLevel=<optimized out>, dest=0x1f0f908, altdest=0x1f0f908, completionTag=0x7fff26f0dcd0 "") at pquery.c:788 #14 0x000000000064be96 in PostgresMain (argc=<optimized out>, argv=<optimized out>, dbname=<optimized out>, username=<optimized out>) at postgres.c:1046 #15 0x000000000060d7be in BackendRun (port=0x1e352a0) at postmaster.c:3614 #16 BackendStartup (port=0x1e352a0) at postmaster.c:3304 #17 ServerLoop () at postmaster.c:1367 #18 0x000000000060e42c in PostmasterMain (argc=argc@entry=5, argv=argv@entry=0x1e0f320) at postmaster.c:1127 #19 0x000000000045b7c9 in main (argc=5, argv=0x1e0f320) at main.c:199 #0 0x0000003e46af6937 in semop () at ../sysdeps/unix/syscall-template.S:81 No locals. #1 0x0000000000600697 in PGSemaphoreLock (sema=0x7fe7027674d0, interruptOK=interruptOK@entry=1 '\001') at pg_sema.c:415 errStatus = <optimized out> sops = {sem_num = 3, sem_op = -1, sem_flg = 0} __func__ = "PGSemaphoreLock" #2 0x000000000063e5c6 in ProcSleep (locallock=locallock@entry=0x1e30560, lockMethodTable=lockMethodTable@entry=0x85e4c0 <default_lockmethod>) at proc.c:1092 lockmode = 5 lock = 0x7fe7024933f8 proclock = <optimized out> hashcode = <optimized out> partitionLock = 53 waitQueue = 0x7fe702493420 myHeldLocks = <optimized out> early_deadlock = <optimized out> allow_autovacuum_cancel = 1 '\001' myWaitStatus = <optimized out> proc = <optimized out> i = <optimized out> __func__ = "ProcSleep" #3 0x00000000006398f9 in WaitOnLock (locallock=locallock@entry=0x1e30560, owner=owner@entry=0x1ef6e28) at lock.c:1586 save_exception_stack = 0x7fff26f0d970 save_context_stack = 0x0 local_sigjmp_buf = {{__jmpbuf = {32468520, 934449783848447161, 53, 140630153119160, 140733846705792, 5, 934449784732921017, -934048294823514951}, __mask_was_saved = 0, __saved_mask = {__val = {0, 0, 140630152524792, 2790716164, 140630155490496, 5, 8774864, 5, 6529124, 47, 0, 140630152524792, 140630155490496, 456, 0, 31655264}}}} lockmethodid = <optimized out> lockMethodTable = 0x85e4c0 <default_lockmethod> new_status = 0x1f16e60 "SELECT" #4 0x000000000063abeb in LockAcquireExtended (locktag=locktag@entry=0x7fff26f0d680, lockmode=lockmode@entry=5, sessionLock=sessionLock@entry=0 '\000', dontWait=dontWait@entry=0 '\000', reportMemoryError=reportMemoryError@entry=1 '\001') at lock.c:957 lockmethodid = <optimized out> lockMethodTable = 0x85e4c0 <default_lockmethod> localtag = {lock = {locktag_field1 = 124785, locktag_field2 = 0, locktag_field3 = 0, locktag_field4 = 0, locktag_type = 4 '\004', locktag_lockmethodid = 1 '\001'}, mode = 5} locallock = <optimized out> lock = 0x7fe7024933f8 proclock = 0x7fe7025245b8 found = 0 '\000' owner = 0x1ef6e28 hashcode = 2790716164 partitionLock = 53 status = <optimized out> log_lock = 0 '\000' __func__ = "LockAcquireExtended" #5 0x000000000063afb1 in LockAcquire (locktag=locktag@entry=0x7fff26f0d680, lockmode=lockmode@entry=5, sessionLock=sessionLock@entry=0 '\000', dontWait=dontWait@entry=0 '\000') at lock.c:662 No locals. #6 0x0000000000639350 in XactLockTableWait (xid=124785) at lmgr.c:495 tag = {locktag_field1 = 124785, locktag_field2 = 0, locktag_field3 = 0, locktag_field4 = 0, locktag_type = 4 '\004', locktag_lockmethodid = 1 '\001'} #7 0x000000000057c0e8 in EvalPlanQualFetch (estate=estate@entry=0x1f13e60, relation=0x7fe70018f1d8, lockmode=lockmode@entry=0, tid=tid@entry=0x7fff26f0d7c0, priorXmax=<optimized out>) at execMain.c:1835 test = <optimized out> update_ctid = {ip_blkid = {bi_hi = 55232, bi_lo = 9968}, ip_posid = 32767} update_xmax = 0 buffer = 417 copyTuple = 0x0 tuple = {t_len = 32, t_self = {ip_blkid = {bi_hi = 0, bi_lo = 0}, ip_posid = 11}, t_tableOid = 365619, t_data = 0x7fe7006fec20} SnapshotDirty = {satisfies = 0x734d90 <HeapTupleSatisfiesDirty>, xmin = 0, xmax = 124785, xip = 0x0, xcnt = 0, subxcnt = 0, subxip = 0x0, suboverflowed = 96 '`', takenDuringRecovery = 66 'B', copied = -15 '\361', curcid = 0, active_count = 1, regd_count = 0} __func__ = "EvalPlanQualFetch" #8 0x000000000059063d in ExecLockRows (node=node@entry=0x1f14070) at nodeLockRows.c:147 aerm = <optimized out> isNull = 0 '\000' buffer = 417 erm = 0x1f13ff0 tuple = {t_len = 32, t_self = {ip_blkid = {bi_hi = 0, bi_lo = 0}, ip_posid = 2}, t_tableOid = 365619, t_data = 0x7fe7006fed40} update_xmax = 124784 lockmode = LockTupleShared copyTuple = <optimized out> datum = <optimized out> update_ctid = {ip_blkid = {bi_hi = 0, bi_lo = 0}, ip_posid = 11} test = <optimized out> slot = 0x1f15060 estate = 0x1f13e60 outerPlan = 0x1f14260 epq_started = 0 '\000' lc = 0x1f162e8 __func__ = "ExecLockRows" #9 0x000000000057ce48 in ExecProcNode (node=node@entry=0x1f14070) at execProcnode.c:495 result = <optimized out> __func__ = "ExecProcNode" #10 0x000000000057a8e0 in ExecutePlan (dest=0x1f0f908, direction=<optimized out>, numberTuples=0, sendTuples=1 '\001', operation=CMD_SELECT, planstate=0x1f14070, estate=0x1f13e60) at execMain.c:1395 slot = <optimized out> current_tuple_count = 1 #11 standard_ExecutorRun (queryDesc=0x1e34030, direction=<optimized out>, count=0) at execMain.c:303 estate = 0x1f13e60 operation = CMD_SELECT dest = 0x1f0f908 sendTuples = <optimized out> oldcontext = 0x1ef6c88 #12 0x000000000064eb50 in PortalRunSelect (portal=portal@entry=0x1e31000, forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807, dest=dest@entry=0x1f0f908) at pquery.c:944 queryDesc = 0x1e34030 direction = <optimized out> nprocessed = <optimized out> __func__ = "PortalRunSelect" #13 0x000000000064fe8b in PortalRun (portal=0x1e31000, count=9223372036854775807, isTopLevel=<optimized out>, dest=0x1f0f908, altdest=0x1f0f908, completionTag=0x7fff26f0dcd0 "") at pquery.c:788 save_exception_stack = 0x7fff26f0dbe0 save_context_stack = 0x0 local_sigjmp_buf = {{__jmpbuf = {32233432, -934044991753147207, 32233944, 32569608, 2, 32176976, 934449784546274489, -934048285353825095}, __mask_was_saved = 0, __saved_mask = {__val = {7438292, 32176976, 32158096, 144, 31657984, 8956638, 80, 31524864, 32233944, 8709199, 2, 32176976, 7512933, 32233432, 2, 31657984}}}} result = <optimized out> nprocessed = <optimized out> saveTopTransactionResourceOwner = 0x1eafc60 saveTopTransactionContext = 0x1eafb50 saveActivePortal = 0x0 saveResourceOwner = 0x1eafc60 savePortalContext = 0x0 saveMemoryContext = 0x1eafb50 __func__ = "PortalRun" #14 0x000000000064be96 in PostgresMain (argc=<optimized out>, argv=<optimized out>, dbname=<optimized out>, username=<optimized out>) at postgres.c:1046 firstchar = 32233432 input_message = {data = 0x1ebc6a0 "\n\tSELECT xmin, xmax, ctid, * FROM skip_locked WHERE pg_advisory_lock(0) is not null FOR SHARE NOWAIT;\n", len = 103, maxlen = 1024, cursor = 103} local_sigjmp_buf = {{__jmpbuf = {31527232, -934045008961327943, 1, 11808576, 0, 1, 934449784523205817, -934048287395402567}, __mask_was_saved = 1, __saved_mask = {__val = {0, 11808576, 0, 60, 267473953272, 2, 206158430256, 140733846707520, 140733846707312, 0, 11808576, 0, 8192, 32, 31527232, 0}}}} send_ready_for_query = 0 '\000' __func__ = "PostgresMain" #15 0x000000000060d7be in BackendRun (port=0x1e352a0) at postmaster.c:3614 ac = 1 secs = 428732844 usecs = 503164 i = <optimized out> av = 0x1e11178 maxac = <optimized out> #16 BackendStartup (port=0x1e352a0) at postmaster.c:3304 bn = 0x1e35260 pid = 0 #17 ServerLoop () at postmaster.c:1367 rmask = {fds_bits = {32, 0 <repeats 15 times>}} selres = <optimized out> readmask = {fds_bits = {56, 0 <repeats 15 times>}} now = <optimized out> last_touch_time = 1375414727 __func__ = "ServerLoop" #18 0x000000000060e42c in PostmasterMain (argc=argc@entry=5, argv=argv@entry=0x1e0f320) at postmaster.c:1127 opt = <optimized out> status = <optimized out> userDoption = <optimized out> listen_addr_saved = <optimized out> i = <optimized out> __func__ = "PostmasterMain" #19 0x000000000045b7c9 in main (argc=5, argv=0x1e0f320) at main.c:199 No locals.
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers