Re: [HACKERS] bug in locking an update tuple chain

2017-07-26 Thread Alvaro Herrera
> The attached patch fixes the problem.  When locking some old tuple version of
> the chain, if we detect that we already hold that lock
> (test_lockmode_for_conflict returns HeapTupleSelfUpdated), do not try to lock
> it again but instead skip ahead to the next version.  This fixes the synthetic
> case in my isolationtester as well as our customer's production case.

Pushed.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bug in locking an update tuple chain

2017-07-26 Thread Alvaro Herrera
Amit Kapila wrote:
> On Sat, Jul 15, 2017 at 2:30 AM, Alvaro Herrera
>  wrote:

> > a transaction wants to lock the
> > updated version of some tuple, and it does so; and some other
> > transaction is also locking the same tuple concurrently in a compatible
> > way.  So both are okay to proceed concurrently.  The problem is that if
> > one of them detects that anything changed in the process of doing this
> > (such as the other session updating the multixact to include itself,
> > both having compatible lock modes), it loops back to ensure xmax/
> > infomask are still sane; but heap_lock_updated_tuple_rec is not prepared
> > to deal with the situation of "the current transaction has the lock
> > already", so it returns a failure and the tuple is returned as "not
> > visible" causing the described problem.
> 
> Your fix seems logical to me, though I have not tested it till now.
> However, I wonder why heap_lock_tuple need to restart from the
> beginning of update-chain in this case?

Well, it's possible that we could change things so that it doesn't need
to re-start from the same spot where it initially began, but I think it
requires changing too much code; I'd rather not touch it in a
back-patchable bug fix.  If we really wanted, we could perhaps change
things to avoid repeated walks of the chain, but I'd see that as a pg11
(or future) change only.  (You would be forgiven for thinking that the
interactions between EvalPlanQualFetch, heap_lock_tuple and
heap_lock_update_tuple are rather Rube Goldbergian, to use Tom's term.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bug in locking an update tuple chain

2017-07-18 Thread Amit Kapila
On Sat, Jul 15, 2017 at 2:30 AM, Alvaro Herrera
 wrote:
> A customer of ours reported a problem in 9.3.14 while inserting tuples
> in a table with a foreign key, with many concurrent transactions doing
> the same: after a few insertions worked sucessfully, a later one would
> return failure indicating that the primary key value was not present in
> the referenced table.  It worked fine for them on 9.3.4.
>
> After some research, we determined that the problem disappeared if
> commit this commit was reverted:
>
> Author: Alvaro Herrera 
> Branch: master Release: REL9_6_BR [533e9c6b0] 2016-07-15 14:17:20 -0400
> Branch: REL9_5_STABLE Release: REL9_5_4 [649dd1b58] 2016-07-15 14:17:20 -0400
> Branch: REL9_4_STABLE Release: REL9_4_9 [166873dd0] 2016-07-15 14:17:20 -0400
> Branch: REL9_3_STABLE Release: REL9_3_14 [6c243f90a] 2016-07-15 14:17:20 -0400
>
> Avoid serializability errors when locking a tuple with a committed update
>
> I spent some time writing an isolationtester spec to reproduce the
> problem.  It turned out that this required six concurrent sessions in
> order for the problem to show up at all, but once I had that, figuring
> out what was going on was simple: a transaction wants to lock the
> updated version of some tuple, and it does so; and some other
> transaction is also locking the same tuple concurrently in a compatible
> way.  So both are okay to proceed concurrently.  The problem is that if
> one of them detects that anything changed in the process of doing this
> (such as the other session updating the multixact to include itself,
> both having compatible lock modes), it loops back to ensure xmax/
> infomask are still sane; but heap_lock_updated_tuple_rec is not prepared
> to deal with the situation of "the current transaction has the lock
> already", so it returns a failure and the tuple is returned as "not
> visible" causing the described problem.
>

Your fix seems logical to me, though I have not tested it till now.
However, I wonder why heap_lock_tuple need to restart from the
beginning of update-chain in this case?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers