On 2018/06/29 14:59, Andres Freund wrote: > On 2018-06-29 11:20:53 +0530, Amit Khandekar wrote: >> On 27 June 2018 at 18:33, Ashutosh Bapat >>> But I am worried that the code will have to create thousand slots if >>> there are thousand partitions. I think we will need to see how much >>> effect that has. >> >> I agree that it does not make sense to create as many slots, if at all >> we go by this approach. Suppose the partitioned table is the only one >> having different tuple descriptor, and rest of the partitions have >> same tuple descriptor. In that case, keep track of unique descriptors, >> and allocate a slot per unique descriptor. > > Why? Compared to the rest of the structures created, a slot is not > particularly expensive? I don't see what you're optimizing here.
What I'm thinking of doing is something that's inspired by one of the things that David Rowley proposes in his patch for PG 12 to remove inefficiencies in the tuple routing code [1]. Instead of a single TupleTableSlot attached at partition_tuple_slot, we allocate an array of TupleTableSlot pointers of same length as the number of partitions, as you mentioned upthread. We then call MakeTupleTableSlot() only if a partition needs it and pass it the partition's TupleDesc. Allocated slots are remembered in a list. ExecDropSingleTupleTableSlot is then called on those allocated slots when the plan ends. Note that the array of slots is not allocated if none of the partitions affected by a given query (or copy) needed to convert tuples. Other issues that you mentioned, such as needless heap_tuple_deform/form being invoked, seem less localized (to me) than this particular issue, so I created a patch for just this, which is attached with this email. I'm thinking about how to fix the other issues, but will need a bit more time. Thanks, Amit [1] https://www.postgresql.org/message-id/CAKJS1f_1RJyFquuCKRFHTdcXqoPX-PYqAd7nz=gvbwvgh4a...@mail.gmail.com
From 126ddc16956ae0aac0cc53ce9d522c76cc99706c Mon Sep 17 00:00:00 2001 From: amit <amitlangot...@gmail.com> Date: Thu, 28 Jun 2018 15:47:47 +0900 Subject: [PATCH v1] Allocate dedicated slots of partition tuple conversion Currently there is just one slot called partition_tuple_slot and we do ExecSetSlotDescriptor every time a tuple is inserted into a partition that requires tuple conversion. That's excessively inefficient during bulk inserts. Fix that by allocating a dedicated slot for each partitions that requires it. --- src/backend/commands/copy.c | 15 +++++++--- src/backend/executor/execMain.c | 37 ++++++++++++++----------- src/backend/executor/execPartition.c | 50 ++++++++++++++++++++++++---------- src/backend/executor/nodeModifyTable.c | 26 ++++++++++++------ src/include/executor/execPartition.h | 12 ++++++-- 5 files changed, 95 insertions(+), 45 deletions(-) diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index 3a66cb5025..4bc5a0199b 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -2612,6 +2612,7 @@ CopyFrom(CopyState cstate) { int leaf_part_index; PartitionTupleRouting *proute = cstate->partition_tuple_routing; + TupleConversionMap *map; /* * Away we go ... If we end up not finding a partition after all, @@ -2693,10 +2694,16 @@ CopyFrom(CopyState cstate) * We might need to convert from the parent rowtype to the * partition rowtype. */ - tuple = ConvertPartitionTupleSlot(proute->parent_child_tupconv_maps[leaf_part_index], - tuple, - proute->partition_tuple_slot, - &slot); + map = proute->parent_child_tupconv_maps[leaf_part_index]; + if (map != NULL) + { + TupleTableSlot *new_slot; + + Assert(proute->partition_tuple_slots != NULL && + proute->partition_tuple_slots[leaf_part_index] != NULL); + new_slot = proute->partition_tuple_slots[leaf_part_index]; + tuple = ConvertPartitionTupleSlot(map, tuple, new_slot, &slot); + } tuple->t_tableOid = RelationGetRelid(resultRelInfo->ri_RelationDesc); } diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c index 969944cc12..9bd5af1ad4 100644 --- a/src/backend/executor/execMain.c +++ b/src/backend/executor/execMain.c @@ -1933,12 +1933,11 @@ ExecPartitionCheckEmitError(ResultRelInfo *resultRelInfo, /* a reverse map */ map = convert_tuples_by_name(old_tupdesc, tupdesc, gettext_noop("could not convert row type")); + /* Input slot might be of a partition, which has a fixed tupdesc. */ + slot = MakeTupleTableSlot(tupdesc); if (map != NULL) - { tuple = do_convert_tuple(tuple, map); - ExecSetSlotDescriptor(slot, tupdesc); - ExecStoreTuple(tuple, slot, InvalidBuffer, false); - } + ExecStoreTuple(tuple, slot, InvalidBuffer, false); } insertedCols = GetInsertedColumns(resultRelInfo, estate); @@ -2012,12 +2011,14 @@ ExecConstraints(ResultRelInfo *resultRelInfo, /* a reverse map */ map = convert_tuples_by_name(orig_tupdesc, tupdesc, gettext_noop("could not convert row type")); + /* + * Input slot might be partition-specific, which has a + * fixed tupdesc. + */ + slot = MakeTupleTableSlot(tupdesc); if (map != NULL) - { tuple = do_convert_tuple(tuple, map); - ExecSetSlotDescriptor(slot, tupdesc); - ExecStoreTuple(tuple, slot, InvalidBuffer, false); - } + ExecStoreTuple(tuple, slot, InvalidBuffer, false); } insertedCols = GetInsertedColumns(resultRelInfo, estate); @@ -2060,12 +2061,14 @@ ExecConstraints(ResultRelInfo *resultRelInfo, /* a reverse map */ map = convert_tuples_by_name(old_tupdesc, tupdesc, gettext_noop("could not convert row type")); + /* + * Input slot might be partition-specific, which has a + * fixed tupdesc. + */ + slot = MakeTupleTableSlot(tupdesc); if (map != NULL) - { tuple = do_convert_tuple(tuple, map); - ExecSetSlotDescriptor(slot, tupdesc); - ExecStoreTuple(tuple, slot, InvalidBuffer, false); - } + ExecStoreTuple(tuple, slot, InvalidBuffer, false); } insertedCols = GetInsertedColumns(resultRelInfo, estate); @@ -2166,12 +2169,14 @@ ExecWithCheckOptions(WCOKind kind, ResultRelInfo *resultRelInfo, /* a reverse map */ map = convert_tuples_by_name(old_tupdesc, tupdesc, gettext_noop("could not convert row type")); + /* + * Input slot might be partition-specific, which has a + * fixed tupdesc. + */ + slot = MakeTupleTableSlot(tupdesc); if (map != NULL) - { tuple = do_convert_tuple(tuple, map); - ExecSetSlotDescriptor(slot, tupdesc); - ExecStoreTuple(tuple, slot, InvalidBuffer, false); - } + ExecStoreTuple(tuple, slot, InvalidBuffer, false); } insertedCols = GetInsertedColumns(resultRelInfo, estate); diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c index 7a4665cc4e..6c541ca43b 100644 --- a/src/backend/executor/execPartition.c +++ b/src/backend/executor/execPartition.c @@ -114,17 +114,9 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, Relation rel) * We need an additional tuple slot for storing transient tuples that * are converted to the root table descriptor. */ - proute->root_tuple_slot = MakeTupleTableSlot(NULL); + proute->root_tuple_slot = MakeTupleTableSlot(RelationGetDescr(rel)); } - /* - * Initialize an empty slot that will be used to manipulate tuples of any - * given partition's rowtype. It is attached to the caller-specified node - * (such as ModifyTableState) and released when the node finishes - * processing. - */ - proute->partition_tuple_slot = MakeTupleTableSlot(NULL); - i = 0; foreach(cell, leaf_parts) { @@ -687,6 +679,33 @@ ExecInitRoutingInfo(ModifyTableState *mtstate, gettext_noop("could not convert row type")); /* + * Initialize an empty slot that will be used to manipulate tuples of any + * this partition's rowtype. + */ + if (proute->parent_child_tupconv_maps[partidx] != NULL) + { + Relation partrel = partRelInfo->ri_RelationDesc; + /* + * Initialized the array where these slots are stored, if not already + * done. + */ + if (proute->partition_tuple_slots == NULL) + proute->partition_tuple_slots = (TupleTableSlot **) + palloc0(proute->num_partitions * + sizeof(TupleTableSlot *)); + + /* + * Now initialize the slot itself and add it to the list of allocated + * slots. + */ + proute->partition_tuple_slots[partidx] = + MakeTupleTableSlot(RelationGetDescr(partrel)); + proute->partition_tuple_slots_alloced = + lappend(proute->partition_tuple_slots_alloced, + proute->partition_tuple_slots[partidx]); + } + + /* * If the partition is a foreign table, let the FDW init itself for * routing tuples to the partition. */ @@ -778,8 +797,7 @@ ConvertPartitionTupleSlot(TupleConversionMap *map, TupleTableSlot *new_slot, TupleTableSlot **p_my_slot) { - if (!map) - return tuple; + Assert(map != NULL && new_slot != NULL); tuple = do_convert_tuple(tuple, map); @@ -788,7 +806,6 @@ ConvertPartitionTupleSlot(TupleConversionMap *map, */ *p_my_slot = new_slot; Assert(new_slot != NULL); - ExecSetSlotDescriptor(new_slot, map->outdesc); ExecStoreTuple(tuple, new_slot, InvalidBuffer, true); return tuple; @@ -806,6 +823,7 @@ ExecCleanupTupleRouting(ModifyTableState *mtstate, { int i; int subplan_index = 0; + ListCell *lc; /* * Remember, proute->partition_dispatch_info[0] corresponds to the root @@ -862,8 +880,12 @@ ExecCleanupTupleRouting(ModifyTableState *mtstate, /* Release the standalone partition tuple descriptors, if any */ if (proute->root_tuple_slot) ExecDropSingleTupleTableSlot(proute->root_tuple_slot); - if (proute->partition_tuple_slot) - ExecDropSingleTupleTableSlot(proute->partition_tuple_slot); + foreach(lc, proute->partition_tuple_slots_alloced) + { + TupleTableSlot *slot = lfirst(lc); + + ExecDropSingleTupleTableSlot(slot); + } } /* diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c index 7e0b867971..542bfe103d 100644 --- a/src/backend/executor/nodeModifyTable.c +++ b/src/backend/executor/nodeModifyTable.c @@ -1128,10 +1128,13 @@ lreplace:; map_index = resultRelInfo - mtstate->resultRelInfo; Assert(map_index >= 0 && map_index < mtstate->mt_nplans); tupconv_map = tupconv_map_for_subplan(mtstate, map_index); - tuple = ConvertPartitionTupleSlot(tupconv_map, - tuple, - proute->root_tuple_slot, - &slot); + if (tupconv_map != NULL) + { + tuple = ConvertPartitionTupleSlot(tupconv_map, + tuple, + proute->root_tuple_slot, + &slot); + } /* * Prepare for tuple routing, making it look like we're inserting @@ -1669,6 +1672,7 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate, int partidx; ResultRelInfo *partrel; HeapTuple tuple; + TupleConversionMap *map; /* * Determine the target partition. If ExecFindPartition does not find a @@ -1756,10 +1760,16 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate, /* * Convert the tuple, if necessary. */ - ConvertPartitionTupleSlot(proute->parent_child_tupconv_maps[partidx], - tuple, - proute->partition_tuple_slot, - &slot); + map = proute->parent_child_tupconv_maps[partidx]; + if (map != NULL) + { + TupleTableSlot *new_slot; + + Assert(proute->partition_tuple_slots != NULL && + proute->partition_tuple_slots[partidx] != NULL); + new_slot = proute->partition_tuple_slots[partidx]; + ConvertPartitionTupleSlot(map, tuple, new_slot, &slot); + } /* Initialize information needed to handle ON CONFLICT DO UPDATE. */ Assert(mtstate != NULL); diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h index 862bf65060..7fa460398f 100644 --- a/src/include/executor/execPartition.h +++ b/src/include/executor/execPartition.h @@ -86,10 +86,15 @@ typedef struct PartitionDispatchData *PartitionDispatch; * element of this array has the index into the * corresponding partition in partitions array. * num_subplan_partition_offsets Length of 'subplan_partition_offsets' array - * partition_tuple_slot TupleTableSlot to be used to manipulate any + * partition_tuple_slots TupleTableSlots to be used to manipulate any * given leaf partition's rowtype after that * partition is chosen for insertion by - * tuple-routing. + * tuple-routing; contains valid entry for each + * leaf partition whose attribute numbers are + * found to differ from the root parent and NULL + * otherwise + * partition_tuple_slots_alloced List of allocated slots that are stored in + * partition_tuple_slots * root_tuple_slot TupleTableSlot to be used to transiently hold * copy of a tuple that's being moved across * partitions in the root partitioned table's @@ -108,7 +113,8 @@ typedef struct PartitionTupleRouting bool *child_parent_map_not_required; int *subplan_partition_offsets; int num_subplan_partition_offsets; - TupleTableSlot *partition_tuple_slot; + TupleTableSlot **partition_tuple_slots; + List *partition_tuple_slots_alloced; TupleTableSlot *root_tuple_slot; } PartitionTupleRouting; -- 2.11.0