Re: [HACKERS] UPDATE of partition key

Amit Khandekar Mon, 19 Jun 2017 23:55:31 -0700

On 20 June 2017 at 03:46, Robert Haas <[email protected]> wrote:
> On Thu, Jun 15, 2017 at 1:36 PM, Amit Khandekar <[email protected]> 
> wrote:
>> Attached patch v10 fixes the above. In the existing code, where it
>> builds WCO constraints for each leaf partition; with the patch, that
>> code now is applicable to row-movement-updates as well.
>
> I guess I don't see why it should work like this.  In the INSERT case,
> we must build withCheckOption objects for each partition because those
> partitions don't appear in the plan otherwise -- but in the UPDATE
> case, they're already there, so why do we need to build anything at
> all?  Similarly for RETURNING projections.  How are the things we need
> for those cases not already getting built, associated with the
> relevant resultRelInfos?  Maybe there's a concern if some children got
> pruned - they could turn out later to be the children into which
> tuples need to be routed. But the patch makes no distinction
> between possibly-pruned children and any others.


Yes, only a subset of the partitions appear in the UPDATE subplans. I
think typically for updates, a very small subset of the total leaf
partitions will be there in the plans, others would get pruned. IMHO,
it would not be worth having an optimization where it opens only those
leaf partitions which are not already there in the subplans. Without
the optimization, we are able to re-use the INSERT infrastructure
without additional changes.


>
>> There is another issue I discovered. The row-movement works fine if
>> the destination leaf partition has different attribute ordering than
>> the root : the existing insert-tuple-routing mapping handles that. But
>> if the source partition has different ordering w.r.t. the root, it has
>> a problem : there is no mapping in the opposite direction, i.e. from
>> the leaf to root. And we require that because the tuple of source leaf
>> partition needs to be converted to root partition tuple descriptor,
>> since ExecFindPartition() starts with root.
>
> Seems reasonable, but...
>
>> To fix this, I have introduced another mapping array
>> mtstate->mt_resultrel_maps[]. This corresponds to the
>> mtstate->resultRelInfo[]. We don't require per-leaf-partition mapping,
>> because the update result relations are pruned subset of the total
>> leaf partitions.
>
> ... I don't understand how you can *not* need a per-leaf-partition
> mapping.  I mean, maybe you only need the mapping for the *unpruned*
> leaf partitions

Yes, we need the mapping only for the unpruned leaf partitions, and
those partitions are available in the per-subplan resultRelInfo's.

> but you certainly need a separate mapping for each one of those.

You mean *each* of the leaf partitions ? I didn't get why we would
need it for each one. The tuple targeted for update belongs to one of
the per-subplan resultInfos. And this tuple is to be routed to another
leaf partition. So the reverse mapping is for conversion from the
source resultRelinfo to the root partition. I am unable to figure out
a scenario where we would require this reverse mapping for partitions
on which UPDATE is *not* going to be executed.

>
> It's possible to imagine driving the tuple routing off of just the
> partition key attributes, extracted from wherever they are inside the
> tuple at the current level, rather than converting to the root's tuple
> format.  However, that's not totally straightforward because there
> could be multiple levels of partitioning throughout the tree and
> different attributes might be needed at different levels.

Yes, the conversion anyway occurs at each of these levels even for
insert, specifically because there can be different partition
attributes each time. For update, its only one additional conversion.
But yes, this new mapping would be required for this one single
conversion.

> Moreover,
> in most cases, the mappings are going to end up being no-ops because
> the column order will be the same, so it's probably not worth
> complicating the code to try to avoid a double conversion that usually
> won't happen.

I agree.


-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPDATE of partition key

Reply via email to