Re: [HACKERS] UPDATE of partition key

Robert Haas Wed, 31 May 2017 14:55:51 -0700

On Mon, May 29, 2017 at 5:26 AM, Amit Kapila <amit.kapil...@gmail.com> wrote:
>> But I think, we can also take step-by-step approach even for v11. If
>> we agree that it is ok to silently do the updates as long as we
>> document the behaviour, we can go ahead and do this, and then as a
>> second step, implement error handling as a separate patch. If that
>> patch does not materialize, we at least have the current behaviour
>> documented.
>
> I think that is sensible approach if we find the second step involves
> big or complicated changes.


I think it is definitely a good idea to separate the two patches.
UPDATE tuple routing without any special handling for the EPQ issue is
just a partitioning feature.  The proposed handling for the EPQ issue
is an *on-disk format change*.  That turns a patch which is subject
only to routine bugs into one which can eat your data permanently --
so having the "can eat your data permanently" separated out for both
review and commit seems only prudent.  For me, it's not a matter of
which patch is big or complicated, but rather a matter of one of them
being a whole lot riskier than the other.  Even UPDATE tuple routing
could mess things up pretty seriously if we end up with tuples in the
wrong partition, of course, but the other thing is still worse.

In terms of a development plan, I think we would need to have both
patches before either could be committed.  I believe that everyone
other than me who has expressed an opinion on this issue has said that
it's unacceptable to just ignore the issue, so it doesn't sound like
there will be much appetite for having #1 go into the tree without #2.
I'm still really concerned about that approach because we do not have
very much bit space left and WARM wants to use quite a bit of it.  I
think it's quite possible that we'll be sad in the future if we find
that we can't implement feature XYZ because of the bit-space consumed
by this feature.  However, I don't have the only vote here and I'm not
going to try to shove this into the tree over multiple objections
(unless there are a lot more votes the other way, but so far there's
no sign of that).

Greg/Amit's idea of using the CTID field rather than an infomask bit
seems like a possibly promising approach.  Not everything that needs
bit-space can use the CTID field, so using it is a little less likely
to conflict with something else we want to do in the future than using
a precious infomask bit.  However, I'm worried about this:

    /* Make sure there is no forward chain link in t_ctid */
    tp.t_data->t_ctid = tp.t_self;

The comment does not say *why* we need to make sure that there is no
forward chain link, but it implies that some code somewhere in the
system does or at one time did depend on no forward link existing.
Any such code that still exists will need to be updated.  Anybody know
what code that might be, exactly?

The other potential issue I see here is that I know the WARM code also
tries to use the bit-space in the CTID field; in particular, it uses
the CTID field of the last tuple in a HOT chain to point back to the
root of the chain.  That seems like it could conflict with the usage
proposed here, but I'm not totally sure.  Has anyone investigated this
issue?

Regarding the trigger issue, I can't claim to have a terribly strong
opinion on this.  I think that practically anything we do here might
upset somebody, but probably any halfway-reasonable thing we choose to
do will be OK for most people.  However, there seems to be a
discrepancy between the approach that got the most votes and the one
that is implemented by the v8 patch, so that seems like something to
fix.

For what it's worth, in the future, I imagine that we might allow
adding a trigger to a partitioned table and having that cascade down
to all descendant tables.  In that world, firing the BR UPDATE trigger
for the old partition and the AR UPDATE trigger for the new partition
will look a lot like the behavior the user would expect on an
unpartitioned table, which could be viewed as a good thing.  On the
other hand, it's still going to be a DELETE+INSERT under the hood for
the foreseeable future, so firing the delete triggers and then the
insert triggers is also defensible.  Is there any big difference
between these appraoches in terms of how much code is required to make
this work?

In terms of the approach taken by the patch itself, it seems
surprising to me that the patch only calls
ExecSetupPartitionTupleRouting when an update fails the partition
constraint.  Note that in the insert case, we call that function at
the start of execution; calling it in the middle seems to involve
additional hazards; for example, is it really safe to add additional
ResultRelInfos midway through the operation?  Is it safe to take more
locks midway through the operation? It seems like it might be a lot
safer to decide at the beginning of the operation whether this is
needed -- we can skip it if none of the columns involved in the
partition key (or partition key expressions) are mentioned in the
update.  (There's also the issue of triggers, but I'm not sure that
it's sensible to allow a trigger on an individual partition to reroute
an update to another partition; what if we get an infinite loop?)

+            if (concurrently_deleted)
+                return NULL;

I don't understand the motivation for this change, and there are no
comments explaining it that I can see.

Perhaps the concurrency-related (i.e. EPQ) behavior here could be
tested via the isolation tester.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPDATE of partition key

Reply via email to