On 2014-03-17 14:16:41 -0400, Tom Lane wrote:
> Andres Freund <and...@2ndquadrant.com> writes:
> > On 2014-03-17 14:01:03 -0400, Tom Lane wrote:
> >> IIUC, this case only occurs when using the new-in-9.3 types of
> >> nonexclusive row locks.  I'm willing to bet that the number of
> >> applications using those is negligible; so I think it's all right to not
> >> mention that case explicitly, as long as the wording doesn't say that
> >> foreign keys are the *only* cause (which I didn't).
> 
> > I actually think the issue could also occur with row locks of other
> > severities (is that the correct term?).
> 
> The commit log entry says
>     
>     We were resetting the tuple's HEAP_HOT_UPDATED flag as well as t_ctid on
>     WAL replay of a tuple-lock operation, which is incorrect when the tuple
>     is already updated.
>     
>     Back-patch to 9.3.  The clearing of both header elements was there
>     previously, but since no update could be present on a tuple that was
>     being locked, it was harmless.
> 
> which I read to mean that the case can't occur with the types of row
> locks that were allowed pre-9.3.

That's not an unreasonable interpretation of the commit message, but I
don't think it's correct with respect to the code :(

> > but if I see correctly it's also triggerable if a backend waits for an
> > updating transaction to finish and follow_updates = true is passed to
> > heap_lock_tuple(). Which e.g. nodeLockRows.c does...
> 
> That sounds backwards.  nodeLockRows locks the latest tuple in the chain,
> so it can't be subject to this.

Hm, I don't see anything in the code preventing it, that's the
lock_tuple() before the EPQ stuff... in ExecLockRows():
        foreach(lc, node->lr_arowMarks)
        {
                test = heap_lock_tuple(erm->relation, &tuple,
                                                           
estate->es_output_cid,
                                                           lockmode, 
erm->noWait, true,
                                                           &buffer, &hufd);
                ReleaseBuffer(buffer);
                switch (test)
                {
                        case HeapTupleSelfUpdated:
...

the true passed to heap_lock_tuple() is the follow_updates
parameter. And then in heap_lock_tuple():
                if (require_sleep)
                {
                        if (infomask & HEAP_XMAX_IS_MULTI)
                        {
...
                                /* if there are updates, follow the update 
chain */
                                if (follow_updates &&
                                        !HEAP_XMAX_IS_LOCKED_ONLY(infomask))
                                {
                                        HTSU_Result res;

                                        res = heap_lock_updated_tuple(relation, 
tuple, &t_ctid,
                                                                                
                  GetCurrentTransactionId(),
...
                        else
                        {
                                /* wait for regular transaction to end */
                                if (nowait)
                                {
                                        if 
(!ConditionalXactLockTableWait(xwait))
                                                ereport(ERROR,
                                                                
(errcode(ERRCODE_LOCK_NOT_AVAILABLE),
                                                                 errmsg("could 
not obtain lock on row in relation \"%s\"",
                                                                                
RelationGetRelationName(relation))));
                                }
                                else
                                        XactLockTableWait(xwait);

                                /* if there are updates, follow the update 
chain */
                                if (follow_updates &&
                                        !HEAP_XMAX_IS_LOCKED_ONLY(infomask))
...
        if (RelationNeedsWAL(relation))
        {
                xl_heap_lock xlrec;
                XLogRecPtr      recptr;
                XLogRecData rdata[2];

                xlrec.target.node = relation->rd_node;
                xlrec.target.tid = tuple->t_self;
...

To me that looks sufficient to trigger the bug, because we're issuing a
wal record about the row that was passed to heap_lock_update(), not the
latest one in the ctid chain. When replaying that record, it will reset
the t_ctid field, thus breaking the chain.

Greetings,

Andres Freund

-- 
 Andres Freund                     http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to