Re: [HACKERS] UPDATE of partition key

Robert Haas Thu, 16 Feb 2017 07:24:06 -0800

On Thu, Feb 16, 2017 at 5:47 AM, Greg Stark <st...@mit.edu> wrote:
> On 13 February 2017 at 12:01, Amit Khandekar <amitdkhan...@gmail.com> wrote:
>> There are a few things that can be discussed about :
>
> If you do a normal update the new tuple is linked to the old one using
> the ctid forming a chain of tuple versions. This tuple movement breaks
> that chain.  So the question I had reading this proposal is what
> behaviour depends on ctid and how is it affected by the ctid chain
> being broken.


I think this is a good question.

> I think the concurrent update case is just a symptom of this. If you
> try to update a row that's locked for a concurrent update you normally
> wait until the concurrent update finishes, then follow the ctid chain
> and recheck the where clause on the target of the link and if it still
> matches you perform the update there.

Right.  EvalPlanQual behavior, in short.

> At least you do that if you have isolation_level set to
> repeatable_read. If you have isolation level set to serializable then
> you just fail with a serialization failure. I think that's what you
> should do if you come across a row that's been updated with a broken
> ctid chain even in repeatable read mode. Just fail with a
> serialization failure and document that in partitioned tables if you
> perform updates that move tuples between partitions then you need to
> be ensure your updates are prepared for serialization failures.

Now, this part I'm not sure about.  What's pretty clear is that,
barring some redesign of the heap format, we can't keep the CTID chain
intact when the tuple moves to a different relfilenode.  What's less
clear is what to do about that.  We can either (1) give up on
EvalPlanQual behavior in this case and act just as we would if the row
had been deleted; no update happens or (2) throw a serialization
error.  You're advocating for #2, but I'm not sure that's right,
because:

1. It's a lot more work,

2. Your proposed implementation needs an on-disk format change that
uses up a scarce infomask bit, and

3. It's not obvious to me that it's clearly preferable from a user
experience standpoint.  I mean, either way the user doesn't get the
behavior that they want.  Either they're hoping for EPQ semantics and
they instead do a no-op update, or they're hoping for EPQ semantics
and they instead get an ERROR.  Generally speaking, we don't throw
serialization errors today at READ COMMITTED, so if we do so here,
that's going to be a noticeable and perhaps unwelcome change.

More opinions welcome.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPDATE of partition key

Reply via email to