On 2018/07/21 0:17, David Rowley wrote: > On 20 July 2018 at 21:44, Amit Langote <langote_amit...@lab.ntt.co.jp> wrote: >> But I don't think the result of make_partition_pruneinfo itself has to be >> List of PartitionedRelPruneInfo nested under PartitionPruneInfo. I gather >> that each PartitionPruneInfo corresponds to each root partitioned table >> and a PartitionedRelPruneInfo contains the actual pruning information, >> which is created for every partitioned table (non-leaf tables), including >> the root tables. I don't think such nesting is necessary. I think that >> just like flattened partitioned_rels list, we should put flattened list of >> PartitionedRelPruneInfo into the Append or MergeAppend plan. No need for >> nesting PartitionedRelPruneInfo under PartitionPruneInfo. > > To do that properly you'd need to mark the target partitioned table of > each hierarchy. Your test of pg_class.relispartition is not going to > work as you're assuming the query is always going to the root. The > user might query some intermediate partitioned table (which will have > relispartition = true). Your patch will fall flat in that case.
Yeah, I forgot to consider that. > You could work around that by having some array that points to the > target partitioned table of each hierarchy, but I don't see why that's > better than having the additional struct. Or it could be a Bitmapset called root_indexes that stores the offset of the first Index value in every partitioned_rels list contained in turn in the list that's passed to make_partition_pruneinfo. > There's also some code > inside ExecFindInitialMatchingSubPlans() which does a backward scan > over the partitions. This must process children before their parents. > Unsure how well that's going to work if we start mixing the > hierarchies. I'm sure it can be made to work providing each hierarchy > is stored together consecutively in the array, but it just seems > pretty fragile to me. That code is already pretty hard to follow. I don't see how removing a nested loop changes things for worse. AIUI, the code replaces index values contained in the subplan_map arrays of various PartitionedRelPruningData structs to account for any pruned sub-plans. Removing a nesting level because of having removed the nesting struct doesn't seem to affect anything about that translation. But your point here seems to be about the relative ordering of PartitionedRelPruningData structs among themselves being affected due to their now being put into a flat array, although I don't see that as being any more fragile. We already are assuming a bunch about the relative ordering of sub-plans or of PartitionedRelPruningData structs to have been relying on storing their indexes in subplan_map and subpart_map. Also, it occurred to me that the new subplan indexes that ExecFindInitialMatchingSubPlans computes are based on where subplans are actually stored in the AppendState.appendplans array, which, in turn, is based on the Bitmapset of "valid subplans" that ExecFindInitialMatchingSubPlans passes back to ExecInitAppend. > What's the reason you don't like the additional level to represent > multiple hierarchies? I started thinking about flattening PartitionedRelPruneInfo after looking at flatten_partitioned_rels() in your patch. If we're flattening partitioned_rels (that is, not having it as a List of Lists in the finished plan), why not flatten the pruning info node too? As I said earlier, I get it that we need List of Lists within the planner to get make_partition_pruneinfo to work correctly in these types of queries, but once we have figured out the correct details to pass to executor to perform run-time pruning, I don't see why we need to pass that info again as a List of Lists. I have attached v2 of the delta patch which adds a root_indexes field to PartitionPruneInfo to track topmost parents in every partition hierarchy contained whose pruning info is contained in the Append. Also, I noticed a bug with how ExecFindInitialMatchingSubPlans handles other_subplans. While the indexes in subplan_map arrays are updated to contain revised values after pruning, those in the other_subplans Bitmapset are not, leading to crashes or possibly wrong result. For example: create table p (a int, b int, c int) partition by list (a); create table p1 partition of p for values in (1); create table p2 partition of p for values in (2); create table q (a int, b int, c int) partition by list (a); create table q1 partition of q for values in (1) partition by list (b); create table q11 partition of q1 for values in (1) partition by list (c); create table q111 partition of q11 for values in (1); create table q2 partition of q for values in (2) partition by list (b); create table q21 partition of q2 for values in (1); create table q22 partition of q2 for values in (2); prepare q (int, int) as select * from ( select * from p union all select * from q1 union all select 1, 1, 1 ) s(a, b, c) where s.a = $1 and s.b = $2 and s.c = (select 1); set plan_cache_mode TO force_generic_plan; explain (costs off, analyze) execute q (1, 1); server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. I have attached a fix for that as a delta patch, which results in: explain (costs off, analyze) execute q (1, 1); QUERY PLAN ────────────────────────────────────────────────────────────────── Append (actual time=0.153..0.179 rows=1 loops=1) InitPlan 1 (returns $0) -> Result (actual time=0.023..0.032 rows=1 loops=1) Subplans Removed: 1 -> Seq Scan on p1 (actual time=0.022..0.022 rows=0 loops=1) Filter: ((a = $1) AND (b = $2) AND (c = $0)) -> Seq Scan on q111 (actual time=0.012..0.012 rows=0 loops=1) Filter: ((a = $1) AND (b = $2) AND (c = $0)) -> Result (actual time=0.014..0.022 rows=1 loops=1) One-Time Filter: ((1 = $1) AND (1 = $2) AND (1 = $0)) Planning Time: 8.136 ms Execution Time: 0.562 ms (12 rows) Thanks, Amit
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c index b5f796f5ed..7d8ff821c7 100644 --- a/src/backend/executor/execPartition.c +++ b/src/backend/executor/execPartition.c @@ -48,7 +48,7 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel, bool *isnull, int maxfieldlen); static List *adjust_partition_tlist(List *tlist, TupleConversionMap *map); -static void find_matching_subplans_recurse(PartitionPruningData *pprune, +static void find_matching_subplans_recurse(PartitionPruneState *prunestate, PartitionedRelPruningData *prelprune, bool initial_prune, Bitmapset **validsubplans); @@ -1396,14 +1396,10 @@ adjust_partition_tlist(List *tlist, TupleConversionMap *map) * * 'partitionpruneinfo' is a PartitionPruneInfo as generated by * make_partition_pruneinfo. Here we build a PartitionPruneState containing a - * PartitionPruningData for each partitionpruneinfo->prune_infos, in - * turn, a PartitionedRelPruningData is created for each - * PartitionedRelPruneInfo stored in each 'prune_infos'. This two-level system - * is required in order to support run-time pruning with UNION ALL parents - * containing one or more partitioned tables as children. The data stored in - * each PartitionedRelPruningData can be re-used each time we re-evaluate - * which partitions match the pruning steps provided in each - * PartitionedRelPruneInfo. + * PartitionedRelPruningData for each PartitionedRelPruneInfo + * in partitionpruneinfo->prune_infos. The data stored in each + * PartitionedRelPruningData can be re-used each time we re-evaluate which + * partitions match the pruning steps provided in each PartitionedRelPruneInfo. */ PartitionPruneState * ExecCreatePartitionPruneState(PlanState *planstate, @@ -1422,14 +1418,15 @@ ExecCreatePartitionPruneState(PlanState *planstate, * Allocate the data structure */ prunestate = (PartitionPruneState *) - palloc(offsetof(PartitionPruneState, partprunedata) + - sizeof(PartitionPruningData *) * n_part_hierarchies); + palloc(offsetof(PartitionPruneState, partrelprunedata) + + sizeof(PartitionedRelPruningData) * n_part_hierarchies); prunestate->num_partprunedata = n_part_hierarchies; prunestate->do_initial_prune = false; /* may be set below */ prunestate->do_exec_prune = false; /* may be set below */ prunestate->execparamids = NULL; prunestate->other_subplans = bms_copy(partitionpruneinfo->other_subplans); + prunestate->root_indexes = bms_copy(partitionpruneinfo->root_indexes); /* * Create a short-term memory context which we'll use when making calls to @@ -1445,127 +1442,112 @@ ExecCreatePartitionPruneState(PlanState *planstate, i = 0; foreach(lc, partitionpruneinfo->prune_infos) { + PartitionedRelPruneInfo *pinfo = castNode(PartitionedRelPruneInfo, lfirst(lc)); + PartitionedRelPruningData *prelprune = &prunestate->partrelprunedata[i]; + PartitionPruneContext *context = &prelprune->context; + PartitionDesc partdesc; + PartitionKey partkey; + int partnatts; + int n_steps; ListCell *lc2; - List *partrelpruneinfos = lfirst_node(List, lc); - PartitionPruningData *prunedata; - int npartrelpruneinfos = list_length(partrelpruneinfos); - int j; - prunedata = palloc(offsetof(PartitionPruningData, partrelprunedata) + - npartrelpruneinfos * sizeof(PartitionedRelPruningData)); - prunestate->partprunedata[i] = prunedata; - prunedata->num_partrelprunedata = npartrelpruneinfos; + /* + * We must copy the subplan_map rather than pointing directly to + * the plan's version, as we may end up making modifications to it + * later. + */ + prelprune->subplan_map = palloc(sizeof(int) * pinfo->nparts); + memcpy(prelprune->subplan_map, pinfo->subplan_map, + sizeof(int) * pinfo->nparts); - j = 0; - foreach(lc2, partrelpruneinfos) + /* We can use the subpart_map verbatim, since we never modify it */ + prelprune->subpart_map = pinfo->subpart_map; + + /* present_parts is also subject to later modification */ + prelprune->present_parts = bms_copy(pinfo->present_parts); + + /* + * We need to hold a pin on the partitioned table's relcache entry so + * that we can rely on its copies of the table's partition key and + * partition descriptor. We need not get a lock though; one should + * have been acquired already by InitPlan or + * ExecLockNonLeafAppendTables. + */ + context->partrel = relation_open(pinfo->reloid, NoLock); + + partkey = RelationGetPartitionKey(context->partrel); + partdesc = RelationGetPartitionDesc(context->partrel); + n_steps = list_length(pinfo->pruning_steps); + + context->strategy = partkey->strategy; + context->partnatts = partnatts = partkey->partnatts; + context->nparts = pinfo->nparts; + context->boundinfo = partdesc->boundinfo; + context->partcollation = partkey->partcollation; + context->partsupfunc = partkey->partsupfunc; + + /* We'll look up type-specific support functions as needed */ + context->stepcmpfuncs = (FmgrInfo *) + palloc0(sizeof(FmgrInfo) * n_steps * partnatts); + + context->ppccontext = CurrentMemoryContext; + context->planstate = planstate; + + /* Initialize expression state for each expression we need */ + context->exprstates = (ExprState **) + palloc0(sizeof(ExprState *) * n_steps * partnatts); + foreach(lc2, pinfo->pruning_steps) { - PartitionedRelPruneInfo *pinfo = castNode(PartitionedRelPruneInfo, lfirst(lc2)); - PartitionedRelPruningData *prelprune = &prunedata->partrelprunedata[j]; - PartitionPruneContext *context = &prelprune->context; - PartitionDesc partdesc; - PartitionKey partkey; - int partnatts; - int n_steps; + PartitionPruneStepOp *step = (PartitionPruneStepOp *) lfirst(lc2); ListCell *lc3; + int keyno; - /* - * We must copy the subplan_map rather than pointing directly to - * the plan's version, as we may end up making modifications to it - * later. - */ - prelprune->subplan_map = palloc(sizeof(int) * pinfo->nparts); - memcpy(prelprune->subplan_map, pinfo->subplan_map, - sizeof(int) * pinfo->nparts); + /* not needed for other step kinds */ + if (!IsA(step, PartitionPruneStepOp)) + continue; - /* We can use the subpart_map verbatim, since we never modify it */ - prelprune->subpart_map = pinfo->subpart_map; + Assert(list_length(step->exprs) <= partnatts); - /* present_parts is also subject to later modification */ - prelprune->present_parts = bms_copy(pinfo->present_parts); - - /* - * We need to hold a pin on the partitioned table's relcache entry - * so that we can rely on its copies of the table's partition key - * and partition descriptor. We need not get a lock though; one - * should have been acquired already by InitPlan or - * ExecLockNonLeafAppendTables. - */ - context->partrel = relation_open(pinfo->reloid, NoLock); - - partkey = RelationGetPartitionKey(context->partrel); - partdesc = RelationGetPartitionDesc(context->partrel); - n_steps = list_length(pinfo->pruning_steps); - - context->strategy = partkey->strategy; - context->partnatts = partnatts = partkey->partnatts; - context->nparts = pinfo->nparts; - context->boundinfo = partdesc->boundinfo; - context->partcollation = partkey->partcollation; - context->partsupfunc = partkey->partsupfunc; - - /* We'll look up type-specific support functions as needed */ - context->stepcmpfuncs = (FmgrInfo *) - palloc0(sizeof(FmgrInfo) * n_steps * partnatts); - - context->ppccontext = CurrentMemoryContext; - context->planstate = planstate; - - /* Initialize expression state for each expression we need */ - context->exprstates = (ExprState **) - palloc0(sizeof(ExprState *) * n_steps * partnatts); - foreach(lc3, pinfo->pruning_steps) + keyno = 0; + foreach(lc3, step->exprs) { - PartitionPruneStepOp *step = (PartitionPruneStepOp *) lfirst(lc3); - ListCell *lc4; - int keyno; + Expr *expr = (Expr *) lfirst(lc3); - /* not needed for other step kinds */ - if (!IsA(step, PartitionPruneStepOp)) - continue; - - Assert(list_length(step->exprs) <= partnatts); - - keyno = 0; - foreach(lc4, step->exprs) + /* not needed for Consts */ + if (!IsA(expr, Const)) { - Expr *expr = (Expr *) lfirst(lc4); + int stateidx = PruneCxtStateIdx(partnatts, + step->step.step_id, + keyno); - /* not needed for Consts */ - if (!IsA(expr, Const)) - { - int stateidx = PruneCxtStateIdx(partnatts, - step->step.step_id, - keyno); - - context->exprstates[stateidx] = - ExecInitExpr(expr, context->planstate); - } - keyno++; + context->exprstates[stateidx] = + ExecInitExpr(expr, context->planstate); } + keyno++; } - - /* Array is not modified at runtime, so just point to plan's copy */ - context->exprhasexecparam = pinfo->hasexecparam; - - prelprune->pruning_steps = pinfo->pruning_steps; - prelprune->do_initial_prune = pinfo->do_initial_prune; - prelprune->do_exec_prune = pinfo->do_exec_prune; - - /* Record if pruning would be useful at any level */ - prunestate->do_initial_prune |= pinfo->do_initial_prune; - prunestate->do_exec_prune |= pinfo->do_exec_prune; - - /* - * Accumulate the IDs of all PARAM_EXEC Params affecting the - * partitioning decisions at this plan node. - */ - prunestate->execparamids = bms_add_members(prunestate->execparamids, - pinfo->execparamids); - - j++; } + + /* Array is not modified at runtime, so just point to plan's copy */ + context->exprhasexecparam = pinfo->hasexecparam; + + prelprune->pruning_steps = pinfo->pruning_steps; + prelprune->do_initial_prune = pinfo->do_initial_prune; + prelprune->do_exec_prune = pinfo->do_exec_prune; + + /* Record if pruning would be useful at any level */ + prunestate->do_initial_prune |= pinfo->do_initial_prune; + prunestate->do_exec_prune |= pinfo->do_exec_prune; + + /* + * Accumulate the IDs of all PARAM_EXEC Params affecting the + * partitioning decisions at this plan node. + */ + prunestate->execparamids = bms_add_members(prunestate->execparamids, + pinfo->execparamids); + i++; } + return prunestate; } @@ -1579,17 +1561,14 @@ ExecCreatePartitionPruneState(PlanState *planstate, void ExecDestroyPartitionPruneState(PartitionPruneState *prunestate) { - PartitionPruningData **partprunedata = prunestate->partprunedata; + PartitionedRelPruningData *partrelprunedata = prunestate->partrelprunedata; int i; for (i = 0; i < prunestate->num_partprunedata; i++) { - PartitionPruningData *pprune = partprunedata[i]; - PartitionedRelPruningData *prunedata = pprune->partrelprunedata; - int j; + PartitionedRelPruningData prunedata = partrelprunedata[i]; - for (j = 0; j < pprune->num_partrelprunedata; j++) - relation_close(prunedata[j].context.partrel, NoLock); + relation_close(prunedata.context.partrel, NoLock); } } @@ -1623,14 +1602,21 @@ ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans) for (i = 0; i < prunestate->num_partprunedata; i++) { - PartitionPruningData *pprune; PartitionedRelPruningData *prelprune; - pprune = prunestate->partprunedata[i]; - prelprune = &pprune->partrelprunedata[0]; + prelprune = &prunestate->partrelprunedata[i]; + + /* + * Only call find_matching_subplans_recurse for the entries + * corresponding to the topmost table of each partition hierarchy, as + * the others are accessed recursively via + * find_matching_subplans_recurse. + */ + if (!bms_is_member(i, prunestate->root_indexes)) + continue; /* Perform pruning without using PARAM_EXEC Params */ - find_matching_subplans_recurse(pprune, prelprune, true, &result); + find_matching_subplans_recurse(prunestate, prelprune, true, &result); /* Expression eval may have used space in node's ps_ExprContext too */ ResetExprContext(prelprune->context.planstate->ps_ExprContext); @@ -1694,61 +1680,57 @@ ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans) * 'present_parts'. */ - for (i = 0; i < prunestate->num_partprunedata; i++) + for (i = prunestate->num_partprunedata - 1; i >= 0; i--) { - int j; - PartitionPruningData *prunedata; + PartitionedRelPruningData *pprune; + int nparts; + int k; - prunedata = prunestate->partprunedata[i]; + pprune = &prunestate->partrelprunedata[i]; + nparts = pprune->context.nparts; + /* We just rebuild present_parts from scratch */ + bms_free(pprune->present_parts); + pprune->present_parts = NULL; - for (j = prunedata->num_partrelprunedata - 1; j >= 0; j--) + for (k = 0; k < nparts; k++) { - PartitionedRelPruningData *pprune; - int nparts; - int k; + int oldidx = pprune->subplan_map[k]; + int subidx; - pprune = &prunedata->partrelprunedata[j]; - nparts = pprune->context.nparts; - /* We just rebuild present_parts from scratch */ - bms_free(pprune->present_parts); - pprune->present_parts = NULL; - - for (k = 0; k < nparts; k++) + /* + * If this partition is a leaf partition, then update its + * subplan index. The new index may have become -1 if the + * subplan was pruned above, or it may have changed to a + * lower value if some subplans earlier in the list were + * being removed. + */ + if (oldidx >= 0) { - int oldidx = pprune->subplan_map[k]; - int subidx; + Assert(oldidx < nsubplans); + pprune->subplan_map[k] = new_subplan_indexes[oldidx]; - /* - * If this partition existed as a subplan then change the - * old subplan index to the new subplan index. The new - * index may become -1 if the partition was pruned above, - * or it may just come earlier in the subplan list due to - * some subplans being removed earlier in the list. If - * it's a subpartition, add it to present_parts unless - * it's entirely pruned. - */ - if (oldidx >= 0) - { - Assert(oldidx < nsubplans); - pprune->subplan_map[k] = new_subplan_indexes[oldidx]; + /* Add to present_parts if the subplan wasn't pruned. */ + if (new_subplan_indexes[oldidx] >= 0) + pprune->present_parts = + bms_add_member(pprune->present_parts, k); + } + /* + * If this is a partitioned table, add to present_parts only + * if at least one of its partitions survived pruning. + */ + else if ((subidx = pprune->subpart_map[k]) >= 0) + { + PartitionedRelPruningData *subprune; - if (new_subplan_indexes[oldidx] >= 0) - pprune->present_parts = - bms_add_member(pprune->present_parts, k); - } - else if ((subidx = pprune->subpart_map[k]) >= 0) - { - PartitionedRelPruningData *subprune; + subprune = &prunestate->partrelprunedata[subidx]; - subprune = &prunedata->partrelprunedata[subidx]; - - if (!bms_is_empty(subprune->present_parts)) - pprune->present_parts = - bms_add_member(pprune->present_parts, k); - } + if (!bms_is_empty(subprune->present_parts)) + pprune->present_parts = + bms_add_member(pprune->present_parts, k); } } } + pfree(new_subplan_indexes); } @@ -1777,18 +1759,26 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate) for (i = 0; i < prunestate->num_partprunedata; i++) { - PartitionPruningData *pprune; PartitionedRelPruningData *prelprune; - pprune = prunestate->partprunedata[i]; - prelprune = &pprune->partrelprunedata[0]; + prelprune = &prunestate->partrelprunedata[i]; - find_matching_subplans_recurse(pprune, prelprune, false, &result); + /* + * Only call find_matching_subplans_recurse for the entries + * corresponding to the topmost table of each partition hierarchy, as + * the others are accessed recursively via + * find_matching_subplans_recurse. + */ + if (!bms_is_member(i, prunestate->root_indexes)) + continue; + + find_matching_subplans_recurse(prunestate, prelprune, false, &result); /* Expression eval may have used space in node's ps_ExprContext too */ ResetExprContext(prelprune->context.planstate->ps_ExprContext); } + MemoryContextSwitchTo(oldcontext); /* Copy result out of the temp context before we reset it */ @@ -1810,7 +1800,7 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate) * Adds valid (non-prunable) subplan IDs to *validsubplans */ static void -find_matching_subplans_recurse(PartitionPruningData *pprune, +find_matching_subplans_recurse(PartitionPruneState *prunestate, PartitionedRelPruningData *prelprune, bool initial_prune, Bitmapset **validsubplans) @@ -1854,8 +1844,8 @@ find_matching_subplans_recurse(PartitionPruningData *pprune, int partidx = prelprune->subpart_map[i]; if (partidx >= 0) - find_matching_subplans_recurse(pprune, - &pprune->partrelprunedata[partidx], + find_matching_subplans_recurse(prunestate, + &prunestate->partrelprunedata[partidx], initial_prune, validsubplans); else { diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c index 7c8220cf65..a06358b048 100644 --- a/src/backend/nodes/copyfuncs.c +++ b/src/backend/nodes/copyfuncs.c @@ -1183,6 +1183,7 @@ _copyPartitionPruneInfo(const PartitionPruneInfo *from) PartitionPruneInfo *newnode = makeNode(PartitionPruneInfo); COPY_NODE_FIELD(prune_infos); + COPY_BITMAPSET_FIELD(root_indexes); COPY_BITMAPSET_FIELD(other_subplans); return newnode; diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c index 6269f474d2..391cd53dcf 100644 --- a/src/backend/nodes/outfuncs.c +++ b/src/backend/nodes/outfuncs.c @@ -1018,6 +1018,7 @@ _outPartitionPruneInfo(StringInfo str, const PartitionPruneInfo *node) WRITE_NODE_TYPE("PARTITIONPRUNEINFO"); WRITE_NODE_FIELD(prune_infos); + WRITE_BITMAPSET_FIELD(root_indexes); WRITE_BITMAPSET_FIELD(other_subplans); } diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c index 3254524223..c565cfad92 100644 --- a/src/backend/nodes/readfuncs.c +++ b/src/backend/nodes/readfuncs.c @@ -2330,6 +2330,7 @@ _readPartitionPruneInfo(void) READ_LOCALS(PartitionPruneInfo); READ_NODE_FIELD(prune_infos); + READ_BITMAPSET_FIELD(root_indexes); READ_BITMAPSET_FIELD(other_subplans); READ_DONE(); diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c index f9e6ad3ab7..c7872661c4 100644 --- a/src/backend/optimizer/plan/createplan.c +++ b/src/backend/optimizer/plan/createplan.c @@ -1033,6 +1033,7 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path) ListCell *subpaths; RelOptInfo *rel = best_path->path.parent; PartitionPruneInfo *partpruneinfo = NULL; + List *flattened_partitioned_rels = NIL; /* * The subpaths list could be empty, if every child was proven empty by @@ -1083,6 +1084,9 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path) prunequal = extract_actual_clauses(rel->baserestrictinfo, false); + flattened_partitioned_rels = + flatten_partitioned_rels(best_path->partitioned_rels); + if (best_path->path.param_info) { List *prmquals = best_path->path.param_info->ppi_clauses; @@ -1098,6 +1102,7 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path) partpruneinfo = make_partition_pruneinfo(root, rel, best_path->partitioned_rels, + flattened_partitioned_rels, best_path->subpaths, prunequal); } @@ -1109,7 +1114,7 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path) */ plan = make_append(subplans, best_path->first_partial_path, - tlist, best_path->partitioned_rels, + tlist, flattened_partitioned_rels, partpruneinfo); copy_generic_path_info(&plan->plan, (Path *) best_path); @@ -1135,6 +1140,7 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path) ListCell *subpaths; RelOptInfo *rel = best_path->path.parent; PartitionPruneInfo *partpruneinfo = NULL; + List *flattened_partitioned_rels = NIL; /* * We don't have the actual creation of the MergeAppend node split out @@ -1233,6 +1239,9 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path) prunequal = extract_actual_clauses(rel->baserestrictinfo, false); + flattened_partitioned_rels = + flatten_partitioned_rels(best_path->partitioned_rels); + if (best_path->path.param_info) { @@ -1247,12 +1256,12 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path) if (prunequal != NIL) partpruneinfo = make_partition_pruneinfo(root, rel, - best_path->partitioned_rels, - best_path->subpaths, prunequal); + best_path->partitioned_rels, + flattened_partitioned_rels, + best_path->subpaths, prunequal); } - node->partitioned_rels = - flatten_partitioned_rels(best_path->partitioned_rels); + node->partitioned_rels = flattened_partitioned_rels; node->mergeplans = subplans; node->part_prune_info = partpruneinfo; @@ -5006,7 +5015,7 @@ bitmap_subplan_mark_shared(Plan *plan) /* * flatten_partitioned_rels * Convert List of Lists into a single List with all elements from the -* sub-lists. + * sub-lists. */ static List * flatten_partitioned_rels(List *partitioned_rels) @@ -5380,8 +5389,9 @@ make_append(List *appendplans, int first_partial_plan, plan->righttree = NULL; node->appendplans = appendplans; node->first_partial_plan = first_partial_plan; - node->partitioned_rels = flatten_partitioned_rels(partitioned_rels); + node->partitioned_rels = partitioned_rels; node->part_prune_info = partpruneinfo; + return node; } diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c index 9ce216c28b..ba06ff7119 100644 --- a/src/backend/partitioning/partprune.c +++ b/src/backend/partitioning/partprune.c @@ -114,6 +114,7 @@ typedef struct PruneStepResult static List *make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel, int *relid_subplan_map, + int *relid_subpart_map, List *partitioned_rels, List *prunequal, Bitmapset **matchedsubplans); static List *gen_partprune_steps(RelOptInfo *rel, List *clauses, @@ -195,12 +196,16 @@ static bool partkey_datum_from_expr(PartitionPruneContext *context, */ PartitionPruneInfo * make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel, - List *partitioned_rels, List *subpaths, + List *partitioned_rels, + List *flattened_partitioned_rels, + List *subpaths, List *prunequal) { PartitionPruneInfo *pruneinfo; Bitmapset *allmatchedsubplans = NULL; + Bitmapset *root_indexes = NULL; int *relid_subplan_map; + int *relid_subpart_map; ListCell *lc; List *prunerelinfos; int i; @@ -230,6 +235,38 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel, relid_subplan_map[pathrel->relid] = i++; } + /* + * Construct a temporary array to map from planner relids to index of the + * partitioned_rel. For convenience, we use 1-based indexes here, so that + * zero can represent an un-filled array entry. + * + * Also, since we're going to flatten the list before putting it into the + * plan, use indexes into the flattened list in the mapping arrays of + * resulting PartitionedRelPruneInfo nodes, instead of indexes into + * individual sub-lists. + */ + relid_subpart_map = palloc0(sizeof(int) * root->simple_rel_array_size); + + /* + * relid_subpart_map maps relid of a non-leaf partition to the index in + * 'partitioned_rels' of that rel (which will also be the index in the + * returned PartitionedRelPruneInfo list of the info for that partition). + */ + i = 1; + foreach(lc, flattened_partitioned_rels) + { + Index rti = lfirst_int(lc); + + Assert(rti < root->simple_rel_array_size); + /* No duplicates please */ + Assert(relid_subpart_map[rti] == 0); + /* Same rel cannot be both leaf and non-leaf */ + Assert(relid_subplan_map[rti] == 0); + + relid_subpart_map[rti] = i++; + } + + Assert(partitioned_rels->type == T_List); prunerelinfos = NIL; @@ -240,22 +277,29 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel, List *rels = lfirst(lc); List *prelinfolist; Bitmapset *matchedsubplans = NULL; + Index root_rt_index = linitial_int(rels); prelinfolist = make_partitionedrel_pruneinfo(root, parentrel, relid_subplan_map, + relid_subpart_map, rels, prunequal, &matchedsubplans); /* When pruning is possible, record the matched subplans */ if (prelinfolist != NIL) { - prunerelinfos = lappend(prunerelinfos, prelinfolist); + prunerelinfos = list_concat(prunerelinfos, + list_copy(prelinfolist)); allmatchedsubplans = bms_join(matchedsubplans, allmatchedsubplans); + root_indexes = + bms_add_member(root_indexes, + relid_subpart_map[root_rt_index] - 1); } } pfree(relid_subplan_map); + pfree(relid_subpart_map); /* * if none of the partition hierarchies had any useful run-time pruning @@ -287,6 +331,10 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel, else pruneinfo->other_subplans = NULL; + /* There should be at least one member. */ + Assert(bms_num_members(root_indexes) > 0); + pruneinfo->root_indexes = root_indexes; + return pruneinfo; } @@ -310,45 +358,17 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel, */ static List * make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel, - int *relid_subplan_map, + int *relid_subplan_map, int *relid_subpart_map, List *partitioned_rels, List *prunequal, Bitmapset **matchedsubplans) { RelOptInfo *targetpart = NULL; List *prelinfolist = NIL; bool doruntimeprune = false; - bool hascontradictingquals = false; ListCell *lc; - int *relid_subpart_map; Bitmapset *subplansfound = NULL; int i; - /* - * Construct a temporary array to map from planner relids to index of the - * partitioned_rel. For convenience, we use 1-based indexes here, so that - * zero can represent an un-filled array entry. - */ - relid_subpart_map = palloc0(sizeof(int) * root->simple_rel_array_size); - - /* - * relid_subpart_map maps relid of a non-leaf partition to the index in - * 'partitioned_rels' of that rel (which will also be the index in the - * returned PartitionedRelPruneInfo list of the info for that partition). - */ - i = 1; - foreach(lc, partitioned_rels) - { - Index rti = lfirst_int(lc); - - Assert(rti < root->simple_rel_array_size); - /* No duplicates please */ - Assert(relid_subpart_map[rti] == 0); - /* Same rel cannot be both leaf and non-leaf */ - Assert(relid_subplan_map[rti] == 0); - - relid_subpart_map[rti] = i++; - } - /* We now build a PartitionedRelPruneInfo for each partitioned rel */ foreach(lc, partitioned_rels) { @@ -477,8 +497,6 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel, prelinfolist = lappend(prelinfolist, prelinfo); } - pfree(relid_subpart_map); - if (!doruntimeprune) return NIL; diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h index 4327fd4cb1..46f23f45de 100644 --- a/src/include/executor/execPartition.h +++ b/src/include/executor/execPartition.h @@ -147,17 +147,6 @@ typedef struct PartitionedRelPruningData bool do_exec_prune; } PartitionedRelPruningData; -/* - * PartitionPruningData - Encapsulates an array of PartitionedRelPruningData - * which belong to a single partition hierarchy containing 1 or more - * partitions. - */ -typedef struct PartitionPruningData -{ - int num_partrelprunedata; - PartitionedRelPruningData partrelprunedata[FLEXIBLE_ARRAY_MEMBER]; -} PartitionPruningData; - /*----------------------- * PartitionPruneState - State object required for plan nodes to perform * run-time partition pruning. @@ -185,9 +174,16 @@ typedef struct PartitionPruningData * These must not be pruned. * prune_context A short-lived memory context in which to execute the * partition pruning functions. - * partprunedata Array of PartitionPruningData pointers for the plan's - * partitioned relation, ordered such that parent tables - * appear before children (hence, topmost table is first). + * root_indexes Contains indexes of PartitionedRelPruningData in the + * array below ('partprunedata') of the topmost + * partitioned tables of each partition hierarchy + * partprunedata Array of pointers of PartitionedRelPruningData structs + * of partitioned relations contained in the plan, + * ordered such that parent tables appear before children + * (hence, the topmost table always appears first in the + * sequence of PartitionedRelPruningData's of partitioned + * tables in a given partition hieratchy and its index + * is contained in 'root_indexes' as mentioned above). *----------------------- */ typedef struct PartitionPruneState @@ -198,7 +194,8 @@ typedef struct PartitionPruneState Bitmapset *execparamids; Bitmapset *other_subplans; MemoryContext prune_context; - PartitionPruningData *partprunedata[FLEXIBLE_ARRAY_MEMBER]; + Bitmapset *root_indexes; + PartitionedRelPruningData partrelprunedata[FLEXIBLE_ARRAY_MEMBER]; } PartitionPruneState; extern PartitionTupleRouting *ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h index a1a782d2f6..c057a5fc33 100644 --- a/src/include/nodes/plannodes.h +++ b/src/include/nodes/plannodes.h @@ -1059,7 +1059,10 @@ typedef struct PlanRowMark * PartitionPruneInfo- - Details required to allow the executor to prune * partitions. * - * prune_infos List of Lists containing PartitionedRelPruneInfo + * prune_infos List of PartitionedRelPruneInfo's + * root_indexes Indexes of PartitionedRelPruneInfo's in 'prune_infos' + * of the topmost partitioned tables in partition + * hierarchies contained in the plan * other_subplans Indexes of any subplans which are not accounted for * by any of the PartitionedRelPruneInfo stored in * 'prune_infos'. @@ -1068,6 +1071,7 @@ typedef struct PartitionPruneInfo { NodeTag type; List *prune_infos; + Bitmapset *root_indexes; Bitmapset *other_subplans; } PartitionPruneInfo; diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h index df3bcb737d..79398d1cc1 100644 --- a/src/include/partitioning/partprune.h +++ b/src/include/partitioning/partprune.h @@ -77,6 +77,7 @@ typedef struct PartitionPruneContext extern PartitionPruneInfo *make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel, List *partitioned_rels, + List *flattened_partitioned_rels, List *subpaths, List *prunequal); extern Relids prune_append_rel_partitions(RelOptInfo *rel); extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c index dac789d414..b5f796f5ed 100644 --- a/src/backend/executor/execPartition.c +++ b/src/backend/executor/execPartition.c @@ -1669,6 +1669,19 @@ ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans) new_subplan_indexes[i] = newidx++; else new_subplan_indexes[i] = -1; /* Newly pruned */ + + /* + * If a subplan in other_subplans got its index updated, update + * other_subplans too. + */ + if (bms_is_member(i, prunestate->other_subplans)) + { + prunestate->other_subplans = + bms_del_member(prunestate->other_subplans, i); + prunestate->other_subplans = + bms_add_member(prunestate->other_subplans, + new_subplan_indexes[i]); + } } /*