On Wed, Oct 5, 2011 at 10:29 PM, Robert Haas <robertmh...@gmail.com> wrote:
> On Wed, Oct 5, 2011 at 5:53 PM, Alex Hunsaker <bada...@gmail.com> wrote:
>> tldr:
>>
>> Seems to be broken by
>> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=4da99ea4231e3d8bbf28b666748c1028e7b7d665
>> :
>> commit 4da99ea4231e3d8bbf28b666748c1028e7b7d665
>> Author: Robert Haas <rh...@postgresql.org>
>> Date:   Mon Jun 27 10:27:17 2011 -0400
>>
>>    Avoid having two copies of the HOT-chain search logic.
>
> Hmm... that's pretty terrible.  Yikes.  That commit wasn't intended to
> change any behavior, just to clean up the code, so I think the right
> thing to do here is figure out how I changed the behavior without
> meaning too, rather than to start patching up all the places that
> might have been affected by whatever the behavior change was.  I'm too
> tired to figure this out right now, but I'll spend some time staring
> at it tomorrow.

Oh, I see the problem, and I now agree that it's the DROP CONSTRAINT
code that is buggy.

Here's what happened. In the old code, when index_getnext() is walking
a HOT chain, before returning each tuple, it remembers the XMAX of
that tuple and the offset of the next tuple in the chain.  So in the
following scenario, we can fail to walk the entire HOT chain:

1. index_getnext() returns what is currently the last tuple in the
chain, remembering the xmax and next tid
2. the caller, or some other process, performs a HOT update of that tuple
3. now index_getnext() is called again, but it uses the remembered
xmax and tid, which are now out-of-date

The new code, on the other hand, doesn't remember the xmax and tid;
instead, it waits until the next call to index_getnext(), and then
fetches them from the previous-returned tuple at that time.  So it
always sees the most current values and, therefore, walks the entire
chain.

In this particular case, the DROP CONSTRAINT code is counting on the
fact that it won't see its own updates, even though it is bumping the
command counter after each one, which is clearly wrong.  I don't think
this was really safe even under the old coding, because there's no
guarantee that any particular update will be HOT, so we might have
found our own update anyway; it was just less likely.

I took a look around for other places that might have this same
problem and didn't find any, but I'm not too confident that that means
there are none there, since there are a fair number of things that can
call CommandCounterIncrement() indirectly.  shdepReassignOwned() does
a direct CCI from within a scan, but ISTM that any update we do there
would produce a new tuple that doesn't match the scan, so that should
be OK.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to