Hi all

Andres and I were going over a patch yesterday and found an unexpected
bug in tqual.c while attempting to trigger a hypothesized bug in that patch.

A SELECT ... FOR SHARE will incorrectly block on another open
transaction that ran SELECT ... FOR SHARE and still holds the locks if
it has to follow a ctid chain from the current snapshot through a
committed update to a share-locked tuple.

This also affects uniqueness checks in btrees, where it can cause
unnecessary waiting. It's also an issue with FOR KEY UPDATE, in that it
can cause an update to block when it doesn't have to.

The attached test case runs under isolationtester to exersise the
problem. I've tested it against 9.2, 9.3, and HEAD, but Andres looked
over the code and says the underlying bug goes back to the commit of
MVCC, it's just become easier to trigger. To reliably test this with
isolationtester I had to horribly abuse pg_advisory_lock(...) so that I
could force the first SELECT ... FOR UPDATE to acquire its snapshot
before the UPDATE runs.

A backtrace of the point where it's incorrectly blocked is attached.
What's happening is that the test for TransactionIdIsInProgress
unconditionally sets snapshot->xmax, even if xmax was only set for
locking purposes. This will cause the caller to wait for the xid in xmax
when it doesn't need to.

It should be testing HEAP_XMAX_IS_LOCKED_ONLY and only setting
snapshot->xmax if the tuple is really being deleted by an open tx.

Note that HeapTupleSatisfiesDirty tests the infomask for
HEAP_XMAX_IS_MULTI and handles multixacts separately, and in that case
it _does_ already test HEAP_XMAX_IS_LOCKED_ONLY.

When you run the attached test case it should block indefinitely. After
applying the attached patch it'll return as expected.

-- 
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
# SELECT ... FOR UPDATE SKIP LOCKED
#
# Sanity checks for SELECT ... FOR UPDATE SKIP LOCKED.

setup
{

  DROP TABLE IF EXISTS skip_locked;
  CREATE TABLE skip_locked (
	id integer PRIMARY KEY,
	value integer not null
  );

  INSERT INTO skip_locked
  SELECT x,x FROM generate_series(1,10) x;
}

teardown
{
  DROP TABLE skip_locked;
}

# The session that does the SKIP LOCKED queries
session "sl1"
step 	"sl1_begin" {
	BEGIN ISOLATION LEVEL READ COMMITTED;
}
step	"sl1_selectblocking" {
	SELECT xmin, xmax, ctid, * FROM skip_locked WHERE pg_advisory_lock(0) is not null FOR SHARE NOWAIT;
}
teardown                    { COMMIT; }

# A session that's used for an UPDATE of the rows to be locked, for when we're testing ctid
# chain following.
session "upd"
step	"upd_getlock" {
	SELECT pg_advisory_lock(0);
}
step    "upd_doupdate" {
	BEGIN ISOLATION LEVEL READ COMMITTED;
	UPDATE skip_locked SET value = value WHERE id % 2 = 0;
	COMMIT;
}
step 	"upd_releaselock" {
	SELECT pg_advisory_unlock(0);
}

# A session that acquires locks that sl1 is supposed to skip
session "lk1"
step    "lk1_doforshare" {
	BEGIN ISOLATION LEVEL READ COMMITTED;
	SELECT id FROM skip_locked WHERE id % 2 = 0 FOR SHARE;
}
teardown {
	COMMIT;
}

permutation "upd_getlock" "sl1_begin" "sl1_selectblocking" "upd_doupdate" "lk1_doforshare" "upd_releaselock"
diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c
new file mode 100644
index da2ce18..0c72115
*** a/src/backend/utils/time/tqual.c
--- b/src/backend/utils/time/tqual.c
*************** HeapTupleSatisfiesDirty(HeapTuple htup,
*** 1012,1018 ****
  
  	if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmax(tuple)))
  	{
! 		snapshot->xmax = HeapTupleHeaderGetRawXmax(tuple);
  		return true;
  	}
  
--- 1012,1019 ----
  
  	if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmax(tuple)))
  	{
! 		if (!HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask))
! 			snapshot->xmax = HeapTupleHeaderGetRawXmax(tuple);
  		return true;
  	}
  
#0  0x0000003e46af6937 in semop () at ../sysdeps/unix/syscall-template.S:81
#1  0x0000000000600697 in PGSemaphoreLock (sema=0x7fe7027674d0, 
interruptOK=interruptOK@entry=1 '\001') at pg_sema.c:415
#2  0x000000000063e5c6 in ProcSleep (locallock=locallock@entry=0x1e30560, 
lockMethodTable=lockMethodTable@entry=0x85e4c0 <default_lockmethod>) at 
proc.c:1092
#3  0x00000000006398f9 in WaitOnLock (locallock=locallock@entry=0x1e30560, 
owner=owner@entry=0x1ef6e28) at lock.c:1586
#4  0x000000000063abeb in LockAcquireExtended 
(locktag=locktag@entry=0x7fff26f0d680, lockmode=lockmode@entry=5, 
sessionLock=sessionLock@entry=0 '\000', dontWait=dontWait@entry=0 '\000', 
reportMemoryError=reportMemoryError@entry=1 '\001') at lock.c:957
#5  0x000000000063afb1 in LockAcquire (locktag=locktag@entry=0x7fff26f0d680, 
lockmode=lockmode@entry=5, sessionLock=sessionLock@entry=0 '\000', 
dontWait=dontWait@entry=0 '\000') at lock.c:662
#6  0x0000000000639350 in XactLockTableWait (xid=124785) at lmgr.c:495
#7  0x000000000057c0e8 in EvalPlanQualFetch (estate=estate@entry=0x1f13e60, 
relation=0x7fe70018f1d8, lockmode=lockmode@entry=0, 
tid=tid@entry=0x7fff26f0d7c0, priorXmax=<optimized out>) at execMain.c:1835
#8  0x000000000059063d in ExecLockRows (node=node@entry=0x1f14070) at 
nodeLockRows.c:147
#9  0x000000000057ce48 in ExecProcNode (node=node@entry=0x1f14070) at 
execProcnode.c:495
#10 0x000000000057a8e0 in ExecutePlan (dest=0x1f0f908, direction=<optimized 
out>, numberTuples=0, sendTuples=1 '\001', operation=CMD_SELECT, 
planstate=0x1f14070, estate=0x1f13e60) at execMain.c:1395
#11 standard_ExecutorRun (queryDesc=0x1e34030, direction=<optimized out>, 
count=0) at execMain.c:303
#12 0x000000000064eb50 in PortalRunSelect (portal=portal@entry=0x1e31000, 
forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807, 
dest=dest@entry=0x1f0f908) at pquery.c:944
#13 0x000000000064fe8b in PortalRun (portal=0x1e31000, 
count=9223372036854775807, isTopLevel=<optimized out>, dest=0x1f0f908, 
altdest=0x1f0f908, completionTag=0x7fff26f0dcd0 "") at pquery.c:788
#14 0x000000000064be96 in PostgresMain (argc=<optimized out>, argv=<optimized 
out>, dbname=<optimized out>, username=<optimized out>) at postgres.c:1046
#15 0x000000000060d7be in BackendRun (port=0x1e352a0) at postmaster.c:3614
#16 BackendStartup (port=0x1e352a0) at postmaster.c:3304
#17 ServerLoop () at postmaster.c:1367
#18 0x000000000060e42c in PostmasterMain (argc=argc@entry=5, 
argv=argv@entry=0x1e0f320) at postmaster.c:1127
#19 0x000000000045b7c9 in main (argc=5, argv=0x1e0f320) at main.c:199



#0  0x0000003e46af6937 in semop () at ../sysdeps/unix/syscall-template.S:81
No locals.
#1  0x0000000000600697 in PGSemaphoreLock (sema=0x7fe7027674d0, 
interruptOK=interruptOK@entry=1 '\001') at pg_sema.c:415
        errStatus = <optimized out>
        sops = {sem_num = 3, sem_op = -1, sem_flg = 0}
        __func__ = "PGSemaphoreLock"
#2  0x000000000063e5c6 in ProcSleep (locallock=locallock@entry=0x1e30560, 
lockMethodTable=lockMethodTable@entry=0x85e4c0 <default_lockmethod>) at 
proc.c:1092
        lockmode = 5
        lock = 0x7fe7024933f8
        proclock = <optimized out>
        hashcode = <optimized out>
        partitionLock = 53
        waitQueue = 0x7fe702493420
        myHeldLocks = <optimized out>
        early_deadlock = <optimized out>
        allow_autovacuum_cancel = 1 '\001'
        myWaitStatus = <optimized out>
        proc = <optimized out>
        i = <optimized out>
        __func__ = "ProcSleep"
#3  0x00000000006398f9 in WaitOnLock (locallock=locallock@entry=0x1e30560, 
owner=owner@entry=0x1ef6e28) at lock.c:1586
        save_exception_stack = 0x7fff26f0d970
        save_context_stack = 0x0
        local_sigjmp_buf = {{__jmpbuf = {32468520, 934449783848447161, 53, 
140630153119160, 140733846705792, 5, 934449784732921017, -934048294823514951}, 
__mask_was_saved = 0, __saved_mask = {__val = {0, 0, 140630152524792, 
2790716164, 140630155490496, 5, 8774864, 5, 6529124, 47, 0, 140630152524792, 
140630155490496, 456, 0, 31655264}}}}
        lockmethodid = <optimized out>
        lockMethodTable = 0x85e4c0 <default_lockmethod>
        new_status = 0x1f16e60 "SELECT"
#4  0x000000000063abeb in LockAcquireExtended 
(locktag=locktag@entry=0x7fff26f0d680, lockmode=lockmode@entry=5, 
sessionLock=sessionLock@entry=0 '\000', dontWait=dontWait@entry=0 '\000', 
reportMemoryError=reportMemoryError@entry=1 '\001') at lock.c:957
        lockmethodid = <optimized out>
        lockMethodTable = 0x85e4c0 <default_lockmethod>
        localtag = {lock = {locktag_field1 = 124785, locktag_field2 = 0, 
locktag_field3 = 0, locktag_field4 = 0, locktag_type = 4 '\004', 
locktag_lockmethodid = 1 '\001'}, mode = 5}
        locallock = <optimized out>
        lock = 0x7fe7024933f8
        proclock = 0x7fe7025245b8
        found = 0 '\000'
        owner = 0x1ef6e28
        hashcode = 2790716164
        partitionLock = 53
        status = <optimized out>
        log_lock = 0 '\000'
        __func__ = "LockAcquireExtended"
#5  0x000000000063afb1 in LockAcquire (locktag=locktag@entry=0x7fff26f0d680, 
lockmode=lockmode@entry=5, sessionLock=sessionLock@entry=0 '\000', 
dontWait=dontWait@entry=0 '\000') at lock.c:662
No locals.
#6  0x0000000000639350 in XactLockTableWait (xid=124785) at lmgr.c:495
        tag = {locktag_field1 = 124785, locktag_field2 = 0, locktag_field3 = 0, 
locktag_field4 = 0, locktag_type = 4 '\004', locktag_lockmethodid = 1 '\001'}
#7  0x000000000057c0e8 in EvalPlanQualFetch (estate=estate@entry=0x1f13e60, 
relation=0x7fe70018f1d8, lockmode=lockmode@entry=0, 
tid=tid@entry=0x7fff26f0d7c0, priorXmax=<optimized out>) at execMain.c:1835
        test = <optimized out>
        update_ctid = {ip_blkid = {bi_hi = 55232, bi_lo = 9968}, ip_posid = 
32767}
        update_xmax = 0
        buffer = 417
        copyTuple = 0x0
        tuple = {t_len = 32, t_self = {ip_blkid = {bi_hi = 0, bi_lo = 0}, 
ip_posid = 11}, t_tableOid = 365619, t_data = 0x7fe7006fec20}
        SnapshotDirty = {satisfies = 0x734d90 <HeapTupleSatisfiesDirty>, xmin = 
0, xmax = 124785, xip = 0x0, xcnt = 0, subxcnt = 0, subxip = 0x0, suboverflowed 
= 96 '`', takenDuringRecovery = 66 'B', copied = -15 '\361', curcid = 0, 
active_count = 1, regd_count = 0}
        __func__ = "EvalPlanQualFetch"
#8  0x000000000059063d in ExecLockRows (node=node@entry=0x1f14070) at 
nodeLockRows.c:147
        aerm = <optimized out>
        isNull = 0 '\000'
        buffer = 417
        erm = 0x1f13ff0
        tuple = {t_len = 32, t_self = {ip_blkid = {bi_hi = 0, bi_lo = 0}, 
ip_posid = 2}, t_tableOid = 365619, t_data = 0x7fe7006fed40}
        update_xmax = 124784
        lockmode = LockTupleShared
        copyTuple = <optimized out>
        datum = <optimized out>
        update_ctid = {ip_blkid = {bi_hi = 0, bi_lo = 0}, ip_posid = 11}
        test = <optimized out>
        slot = 0x1f15060
        estate = 0x1f13e60
        outerPlan = 0x1f14260
        epq_started = 0 '\000'
        lc = 0x1f162e8
        __func__ = "ExecLockRows"
#9  0x000000000057ce48 in ExecProcNode (node=node@entry=0x1f14070) at 
execProcnode.c:495
        result = <optimized out>
        __func__ = "ExecProcNode"
#10 0x000000000057a8e0 in ExecutePlan (dest=0x1f0f908, direction=<optimized 
out>, numberTuples=0, sendTuples=1 '\001', operation=CMD_SELECT, 
planstate=0x1f14070, estate=0x1f13e60) at execMain.c:1395
        slot = <optimized out>
        current_tuple_count = 1
#11 standard_ExecutorRun (queryDesc=0x1e34030, direction=<optimized out>, 
count=0) at execMain.c:303
        estate = 0x1f13e60
        operation = CMD_SELECT
        dest = 0x1f0f908
        sendTuples = <optimized out>
        oldcontext = 0x1ef6c88
#12 0x000000000064eb50 in PortalRunSelect (portal=portal@entry=0x1e31000, 
forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807, 
dest=dest@entry=0x1f0f908) at pquery.c:944
        queryDesc = 0x1e34030
        direction = <optimized out>
        nprocessed = <optimized out>
        __func__ = "PortalRunSelect"
#13 0x000000000064fe8b in PortalRun (portal=0x1e31000, 
count=9223372036854775807, isTopLevel=<optimized out>, dest=0x1f0f908, 
altdest=0x1f0f908, completionTag=0x7fff26f0dcd0 "") at pquery.c:788
        save_exception_stack = 0x7fff26f0dbe0
        save_context_stack = 0x0
        local_sigjmp_buf = {{__jmpbuf = {32233432, -934044991753147207, 
32233944, 32569608, 2, 32176976, 934449784546274489, -934048285353825095}, 
__mask_was_saved = 0, __saved_mask = {__val = {7438292, 32176976, 32158096, 
144, 31657984, 8956638, 80, 31524864, 32233944, 8709199, 2, 32176976, 7512933, 
32233432, 2, 31657984}}}}
        result = <optimized out>
        nprocessed = <optimized out>
        saveTopTransactionResourceOwner = 0x1eafc60
        saveTopTransactionContext = 0x1eafb50
        saveActivePortal = 0x0
        saveResourceOwner = 0x1eafc60
        savePortalContext = 0x0
        saveMemoryContext = 0x1eafb50
        __func__ = "PortalRun"
#14 0x000000000064be96 in PostgresMain (argc=<optimized out>, argv=<optimized 
out>, dbname=<optimized out>, username=<optimized out>) at postgres.c:1046
        firstchar = 32233432
        input_message = {data = 0x1ebc6a0 "\n\tSELECT xmin, xmax, ctid, * FROM 
skip_locked WHERE pg_advisory_lock(0) is not null FOR SHARE NOWAIT;\n", len = 
103, maxlen = 1024, cursor = 103}
        local_sigjmp_buf = {{__jmpbuf = {31527232, -934045008961327943, 1, 
11808576, 0, 1, 934449784523205817, -934048287395402567}, __mask_was_saved = 1, 
__saved_mask = {__val = {0, 11808576, 0, 60, 267473953272, 2, 206158430256, 
140733846707520, 140733846707312, 0, 11808576, 0, 8192, 32, 31527232, 0}}}}
        send_ready_for_query = 0 '\000'
        __func__ = "PostgresMain"
#15 0x000000000060d7be in BackendRun (port=0x1e352a0) at postmaster.c:3614
        ac = 1
        secs = 428732844
        usecs = 503164
        i = <optimized out>
        av = 0x1e11178
        maxac = <optimized out>
#16 BackendStartup (port=0x1e352a0) at postmaster.c:3304
        bn = 0x1e35260
        pid = 0
#17 ServerLoop () at postmaster.c:1367
        rmask = {fds_bits = {32, 0 <repeats 15 times>}}
        selres = <optimized out>
        readmask = {fds_bits = {56, 0 <repeats 15 times>}}
        now = <optimized out>
        last_touch_time = 1375414727
        __func__ = "ServerLoop"
#18 0x000000000060e42c in PostmasterMain (argc=argc@entry=5, 
argv=argv@entry=0x1e0f320) at postmaster.c:1127
        opt = <optimized out>
        status = <optimized out>
        userDoption = <optimized out>
        listen_addr_saved = <optimized out>
        i = <optimized out>
        __func__ = "PostmasterMain"
#19 0x000000000045b7c9 in main (argc=5, argv=0x1e0f320) at main.c:199
No locals.
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to