It is more or less well known that the planner doesn't perform well with more than a few hundred partitions even when only a handful of partitions are ultimately included in the plan. Situation has improved a bit in PG 11 where we replaced the older method of pruning partitions one-by-one using constraint exclusion with a much faster method that finds relevant partitions by using partitioning metadata. However, we could only use it for SELECT queries, because UPDATE/DELETE are handled by a completely different code path, whose structure doesn't allow it to call the new pruning module's functionality. Actually, not being able to use the new pruning is not the only problem for UPDATE/DELETE, more on which further below.
While situation improved with new pruning where it could, there are still overheads in the way planner handles partitions. As things stand today, it will spend cycles and allocate memory for partitions even before pruning is performed, meaning most of that effort could be for partitions that were better left untouched. Currently, planner will lock, heap_open *all* partitions, create range table entries and AppendRelInfos for them, and finally initialize RelOptInfos for them, even touching the disk file of each partition in the process, in an earlier phase of planning. All of that processing is vain for partitions that are pruned, because they won't be included in the final plan. This problem grows worse as the number of partitions grows beyond thousands, because range table grows too big. That could be fixed by delaying all that per-partition activity to a point where pruning has already been performed, so that we know the partitions to open and create planning data structures for, such as somewhere downstream to query_planner. But before we can do that we must do something about the fact that UPDATE/DELETE won't be able to cope with that because the code path that currently handles the planning of UPDATE/DELETE on partitioned tables (inheritance_planner called from subquery_planner) relies on AppendRelInfos for all partitions having been initialized by an earlier planning phase. Delaying it to query_planner would be too late, because inheritance_planner calls query_planner for each partition, not for the parent. That is, if query_planner, which is downstream to inheritance_planner, was in the charge of determining which partitions to open, the latter wouldn't know which partitions to call the former for. :) That would be fixed if there is no longer this ordering dependency, which is what I propose to do with the attached patch 0001. I've tried to describe how the patch manages to do that in its commit message, but I'll summarize here. As things stand today, inheritance_planner modifies the query for each leaf partition to make the partition act as the query's result relation instead of the original partitioned table and calls grouping_planner on the query. That means anything that's joined to partitioned table looks to instead be joined to the partition and join paths are generated likewise. Also, the resulting path's targetlist is adjusted to be suitable for the result partition. Upon studying how this works, I concluded that the same result can be achieved if we call grouping_planner only once and repeat the portions of query_planner's and grouping_planner's processing that generate the join paths and appropriate target list, respectively, for each partition. That way, we can rely on query_planner determining result partitions for us, which in turn relies on the faster partprune.c based method of pruning. That speeds things up in two ways. Faster pruning and we no longer repeat common processing for each partition. With 0001 in place, there is nothing that requires that partitions be opened by an earlier planning phase, so, I propose patch 0002, which refactors the opening and creation of planner data structures for partitions such that it is now performed after pruning. However, it doesn't do anything about the fact that partitions are all still locked in the earlier phase. With various overheads gone thanks to 0001 and 0002, locking of all partitions via find_all_inheritos can be seen as the single largest bottleneck, which 0003 tries to address. I've kept it a separate patch, because I'll need to think a bit more to say that it's actually to safe to defer locking to late planning, due mainly to the concern about the change in the order of locking from the current method. I'm attaching it here, because I also want to show the performance improvement we can expect with it. I measured the gain in performance due to each patch on a modest virtual machine. Details of the measurement and results follow. * Benchmark scripts update.sql update ht set a = 0 where b = 1; select.sql select * from ht where b = 1; * Table: create table ht (a int, b int) partition by hash (b) create table ht_1 partition of ht for values with (modulus N, remainder 0) .. create table ht_N partition of ht for values with (modulus N, remainder N-1) * Rounded tps with update.sql and select.sql against regular table (nparts = 0) and partitioned table with various partition counts: pgbench -n -T 60 -f update.sql nparts master 0001 0002 0003 ====== ====== ==== ==== ==== 0 2856 2893 2862 2816 8 507 1115 1447 1872 16 260 765 1173 1892 32 119 483 922 1884 64 59 282 615 1881 128 29 153 378 1835 256 14 79 210 1803 512 5 40 113 1728 1024 2 17 57 1616 2048 0* 9 30 1471 4096 0+ 4 15 1236 8192 0= 2 7 975 * 0.46 + 0.0064 = 0 (OOM on a virtual machine with 4GB RAM) As can be seen here, 0001 is a big help for update queries. pgbench -n -T 60 -f select.sql For a select query that doesn't contain join and needs to scan only one partition: nparts master 0001 0002 0003 ====== ====== ==== ==== ==== 0 2290 2329 2319 2268 8 1058 1077 1414 1788 16 711 729 1124 1789 32 450 475 879 1773 64 265 272 603 1765 128 146 149 371 1685 256 76 77 214 1678 512 39 39 112 1636 1024 16 17 59 1525 2048 8 9 29 1416 4096 4 4 15 1195 8192 2 2 7 932 Actually, here we get almost same numbers with 0001 as with master, because 0001 changes nothing for SELECT queries. We start seeing improvement with 0002, the patch to delay opening partitions. Thanks, Amit
From 060bd2445ea9cba9adadd73505689d6f06583ee8 Mon Sep 17 00:00:00 2001 From: amit <amitlangot...@gmail.com> Date: Fri, 24 Aug 2018 12:39:36 +0900 Subject: [PATCH 1/3] Overhaul partitioned table update/delete planning Current method, inheritance_planner, applies grouping_planner and hence query_planner to the query repeatedly with each leaf partition replacing the root parent as the query's result relation. One big drawback of this approach is that it cannot use partprune.c to perform partition pruning on the partitioned result relation, because it can only be invoked if query_planner sees the partitioned relation itself in the query. That is not true with the existing method, because as mentioned above, query_planner is invoked with the partitioned relation replaced with individual leaf partitions. While most of the work in each repitition of grouping_planner (and query_planner) is same, a couple of things may differ from partition to partition -- 1. Join planning may produce different Paths for joining against different result partitions, 2. grouping_planner may produce different top-level target lists for different partitions, based on their TupleDescs. This commit rearranges things so that, only the planning steps that affect 1 and 2 above are repeated for partitions that are selected by query_planner by applying partprune.c based pruning to the original partitioned result rel. That makes things faster because 1. partprune.c based pruning is used instead of using constraint exclusion for each partition, 2. grouping_planner (and query_planner) is invoked only once instead of for every partition thus saving cycles and memory. This still doesn't help much if no partitions are pruned, because we still repeat join planning and makes copies of the query for each partition, but for common cases where only handful partitions remain after pruning, this makes things significanly faster. --- doc/src/sgml/ddl.sgml | 15 +- src/backend/optimizer/path/allpaths.c | 97 ++++++- src/backend/optimizer/plan/planmain.c | 4 +- src/backend/optimizer/plan/planner.c | 378 ++++++++++++++++++++------- src/backend/optimizer/prep/prepunion.c | 28 +- src/backend/optimizer/util/plancat.c | 30 --- src/test/regress/expected/partition_join.out | 4 +- 7 files changed, 416 insertions(+), 140 deletions(-) diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml index b5ed1b7939..53c479fbb8 100644 --- a/doc/src/sgml/ddl.sgml +++ b/doc/src/sgml/ddl.sgml @@ -3933,16 +3933,6 @@ EXPLAIN SELECT count(*) FROM measurement WHERE logdate >= DATE '2008-01-01'; <xref linkend="guc-enable-partition-pruning"/> setting. </para> - <note> - <para> - Currently, pruning of partitions during the planning of an - <command>UPDATE</command> or <command>DELETE</command> command is - implemented using the constraint exclusion method (however, it is - controlled by the <literal>enable_partition_pruning</literal> rather than - <literal>constraint_exclusion</literal>) — see the following section - for details and caveats that apply. - </para> - <para> Execution-time partition pruning currently occurs for the <literal>Append</literal> and <literal>MergeAppend</literal> node types. @@ -3964,9 +3954,8 @@ EXPLAIN SELECT count(*) FROM measurement WHERE logdate >= DATE '2008-01-01'; <para> <firstterm>Constraint exclusion</firstterm> is a query optimization - technique similar to partition pruning. While it is primarily used - for partitioning implemented using the legacy inheritance method, it can be - used for other purposes, including with declarative partitioning. + technique similar to partition pruning. It is primarily used + for partitioning implemented using the legacy inheritance method. </para> <para> diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c index 0e80aeb65c..5937c0436a 100644 --- a/src/backend/optimizer/path/allpaths.c +++ b/src/backend/optimizer/path/allpaths.c @@ -36,6 +36,7 @@ #include "optimizer/pathnode.h" #include "optimizer/paths.h" #include "optimizer/plancat.h" +#include "optimizer/planmain.h" #include "optimizer/planner.h" #include "optimizer/prep.h" #include "optimizer/restrictinfo.h" @@ -119,6 +120,9 @@ static void set_namedtuplestore_pathlist(PlannerInfo *root, RelOptInfo *rel, static void set_worktable_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte); static RelOptInfo *make_rel_from_joinlist(PlannerInfo *root, List *joinlist); +static RelOptInfo *partitionwise_make_rel_from_joinlist(PlannerInfo *root, + RelOptInfo *parent, + List *joinlist); static bool subquery_is_pushdown_safe(Query *subquery, Query *topquery, pushdown_safety_info *safetyInfo); static bool recurse_pushdown_safe(Node *setOp, Query *topquery, @@ -181,13 +185,30 @@ make_one_rel(PlannerInfo *root, List *joinlist) /* * Generate access paths for the entire join tree. + * + * If we're doing this for an UPDATE or DELETE query whose target is a + * partitioned table, we must do the join planning against each of its + * leaf partitions instead. */ - rel = make_rel_from_joinlist(root, joinlist); + if (root->parse->resultRelation && + root->parse->commandType != CMD_INSERT && + root->simple_rel_array[root->parse->resultRelation] && + root->simple_rel_array[root->parse->resultRelation]->part_scheme) + { + RelOptInfo *rootrel = root->simple_rel_array[root->parse->resultRelation]; - /* - * The result should join all and only the query's base rels. - */ - Assert(bms_equal(rel->relids, root->all_baserels)); + rel = partitionwise_make_rel_from_joinlist(root, rootrel, joinlist); + } + else + { + rel = make_rel_from_joinlist(root, joinlist); + + /* + * The result should join all and only the query's base rels. + */ + Assert(bms_equal(rel->relids, root->all_baserels)); + + } return rel; } @@ -2591,6 +2612,72 @@ generate_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_rows) } /* + * partitionwise_make_rel_from_joinlist + * performs join planning against each of the leaf partitions contained + * in the partition tree whose root relation is 'parent' + * + * Recursively called for each partitioned table contained in a given + *partition tree. + */ +static RelOptInfo * +partitionwise_make_rel_from_joinlist(PlannerInfo *root, + RelOptInfo *parent, + List *joinlist) +{ + int i; + + Assert(root->parse->resultRelation != 0); + Assert(parent->part_scheme != NULL); + + for (i = 0; i < parent->nparts; i++) + { + RelOptInfo *partrel = parent->part_rels[i]; + AppendRelInfo *appinfo; + List *translated_joinlist; + List *saved_join_info_list = list_copy(root->join_info_list); + + /* Ignore pruned partitions. */ + if (IS_DUMMY_REL(partrel)) + continue; + + /* + * Hack to make the join planning code believe that 'partrel' can + * be joined against. + */ + partrel->reloptkind = RELOPT_BASEREL; + + /* + * Replace references to the parent rel in expressions relevant to join + * planning. + */ + appinfo = root->append_rel_array[partrel->relid]; + translated_joinlist = (List *) + adjust_appendrel_attrs(root, (Node *) joinlist, + 1, &appinfo); + root->join_info_list = (List *) + adjust_appendrel_attrs(root, + (Node *) root->join_info_list, + 1, &appinfo); + /* Reset join planning data structures for a new partition. */ + root->join_rel_list = NIL; + root->join_rel_hash = NULL; + + /* Recurse if the partition is itself a partitioned table. */ + if (partrel->part_scheme != NULL) + partrel = partitionwise_make_rel_from_joinlist(root, partrel, + translated_joinlist); + else + /* Perform the join planning and save the resulting relation. */ + parent->part_rels[i] = + make_rel_from_joinlist(root, translated_joinlist); + + root->join_info_list = saved_join_info_list; + } + + return parent; +} + +/* * make_rel_from_joinlist * Build access paths using a "joinlist" to guide the join path search. * diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c index b05adc70c4..3f0d80eaa6 100644 --- a/src/backend/optimizer/plan/planmain.c +++ b/src/backend/optimizer/plan/planmain.c @@ -266,7 +266,9 @@ query_planner(PlannerInfo *root, List *tlist, /* Check that we got at least one usable path */ if (!final_rel || !final_rel->cheapest_total_path || - final_rel->cheapest_total_path->param_info != NULL) + final_rel->cheapest_total_path->param_info != NULL || + (final_rel->relid == root->parse->resultRelation && + root->parse->commandType == CMD_INSERT)) elog(ERROR, "failed to construct the join relation"); return final_rel; diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c index 96bf0601a8..076dbd3d62 100644 --- a/src/backend/optimizer/plan/planner.c +++ b/src/backend/optimizer/plan/planner.c @@ -238,6 +238,16 @@ static bool group_by_has_partkey(RelOptInfo *input_rel, List *targetList, List *groupClause); +static void partitionwise_adjust_scanjoin_target(PlannerInfo *root, + RelOptInfo *parent, + List **partition_subroots, + List **partitioned_rels, + List **resultRelations, + List **subpaths, + List **WCOLists, + List **returningLists, + List **rowMarks); + /***************************************************************************** * @@ -959,7 +969,9 @@ subquery_planner(PlannerGlobal *glob, Query *parse, * needs special processing, else go straight to grouping_planner. */ if (parse->resultRelation && - rt_fetch(parse->resultRelation, parse->rtable)->inh) + rt_fetch(parse->resultRelation, parse->rtable)->inh && + rt_fetch(parse->resultRelation, parse->rtable)->relkind != + RELKIND_PARTITIONED_TABLE) inheritance_planner(root); else grouping_planner(root, false, tuple_fraction); @@ -1688,6 +1700,14 @@ grouping_planner(PlannerInfo *root, bool inheritance_update, RelOptInfo *current_rel; RelOptInfo *final_rel; ListCell *lc; + List *orig_parse_tlist = list_copy(parse->targetList); + List *partition_subroots = NIL; + List *partitioned_rels = NIL; + List *partition_resultRelations = NIL; + List *partition_subpaths = NIL; + List *partition_WCOLists = NIL; + List *partition_returningLists = NIL; + List *partition_rowMarks = NIL; /* Tweak caller-supplied tuple_fraction if have LIMIT/OFFSET */ if (parse->limitCount || parse->limitOffset) @@ -2018,13 +2038,44 @@ grouping_planner(PlannerInfo *root, bool inheritance_update, scanjoin_targets_contain_srfs = NIL; } - /* Apply scan/join target. */ - scanjoin_target_same_exprs = list_length(scanjoin_targets) == 1 - && equal(scanjoin_target->exprs, current_rel->reltarget->exprs); - apply_scanjoin_target_to_paths(root, current_rel, scanjoin_targets, - scanjoin_targets_contain_srfs, - scanjoin_target_parallel_safe, - scanjoin_target_same_exprs); + /* + * For an UPDATE/DELETE query whose target is partitioned table, we + * must generate the targetlist for each of its leaf partitions and + * apply that. + */ + if (current_rel->reloptkind == RELOPT_BASEREL && + current_rel->part_scheme && + current_rel->relid == root->parse->resultRelation && + parse->commandType != CMD_INSERT) + { + /* + * scanjoin_target shouldn't have changed from final_target, + * because UPDATE/DELETE doesn't support various features that + * would've required modifications that are performed above. + * That's important because we'll generate final_target freshly + * for each partition in partitionwise_adjust_scanjoin_target. + */ + Assert(scanjoin_target == final_target); + root->parse->targetList = orig_parse_tlist; + partitionwise_adjust_scanjoin_target(root, current_rel, + &partition_subroots, + &partitioned_rels, + &partition_resultRelations, + &partition_subpaths, + &partition_WCOLists, + &partition_returningLists, + &partition_rowMarks); + } + else + { + /* Apply scan/join target. */ + scanjoin_target_same_exprs = list_length(scanjoin_targets) == 1 + && equal(scanjoin_target->exprs, current_rel->reltarget->exprs); + apply_scanjoin_target_to_paths(root, current_rel, scanjoin_targets, + scanjoin_targets_contain_srfs, + scanjoin_target_parallel_safe, + scanjoin_target_same_exprs); + } /* * Save the various upper-rel PathTargets we just computed into @@ -2136,93 +2187,119 @@ grouping_planner(PlannerInfo *root, bool inheritance_update, final_rel->useridiscurrent = current_rel->useridiscurrent; final_rel->fdwroutine = current_rel->fdwroutine; - /* - * Generate paths for the final_rel. Insert all surviving paths, with - * LockRows, Limit, and/or ModifyTable steps added if needed. - */ - foreach(lc, current_rel->pathlist) + if (current_rel->reloptkind == RELOPT_BASEREL && + current_rel->relid == root->parse->resultRelation && + current_rel->part_scheme && + parse->commandType != CMD_INSERT) { - Path *path = (Path *) lfirst(lc); - - /* - * If there is a FOR [KEY] UPDATE/SHARE clause, add the LockRows node. - * (Note: we intentionally test parse->rowMarks not root->rowMarks - * here. If there are only non-locking rowmarks, they should be - * handled by the ModifyTable node instead. However, root->rowMarks - * is what goes into the LockRows node.) - */ - if (parse->rowMarks) - { - path = (Path *) create_lockrows_path(root, final_rel, path, - root->rowMarks, - SS_assign_special_param(root)); - } - - /* - * If there is a LIMIT/OFFSET clause, add the LIMIT node. - */ - if (limit_needed(parse)) - { - path = (Path *) create_limit_path(root, final_rel, path, - parse->limitOffset, - parse->limitCount, - offset_est, count_est); - } - - /* - * If this is an INSERT/UPDATE/DELETE, and we're not being called from - * inheritance_planner, add the ModifyTable node. - */ - if (parse->commandType != CMD_SELECT && !inheritance_update) - { - List *withCheckOptionLists; - List *returningLists; - List *rowMarks; - - /* - * Set up the WITH CHECK OPTION and RETURNING lists-of-lists, if - * needed. - */ - if (parse->withCheckOptions) - withCheckOptionLists = list_make1(parse->withCheckOptions); - else - withCheckOptionLists = NIL; - - if (parse->returningList) - returningLists = list_make1(parse->returningList); - else - returningLists = NIL; - - /* - * If there was a FOR [KEY] UPDATE/SHARE clause, the LockRows node - * will have dealt with fetching non-locked marked rows, else we - * need to have ModifyTable do that. - */ - if (parse->rowMarks) - rowMarks = NIL; - else - rowMarks = root->rowMarks; - - path = (Path *) + Path *path = (Path *) create_modifytable_path(root, final_rel, parse->commandType, parse->canSetTag, parse->resultRelation, - NIL, - false, - list_make1_int(parse->resultRelation), - list_make1(path), - list_make1(root), - withCheckOptionLists, - returningLists, - rowMarks, - parse->onConflict, + partitioned_rels, + root->partColsUpdated, + partition_resultRelations, + partition_subpaths, + partition_subroots, + partition_WCOLists, + partition_returningLists, + partition_rowMarks, + NULL, SS_assign_special_param(root)); - } - - /* And shove it into final_rel */ add_path(final_rel, path); } + else + { + /* + * Generate paths for the final_rel. Insert all surviving paths, with + * LockRows, Limit, and/or ModifyTable steps added if needed. + */ + foreach(lc, current_rel->pathlist) + { + Path *path = (Path *) lfirst(lc); + + /* + * If there is a FOR [KEY] UPDATE/SHARE clause, add the LockRows + * node. (Note: we intentionally test parse->rowMarks not + * root->rowMarks here. If there are only non-locking rowmarks, + * they should be handled by the ModifyTable node instead. + * However, root->rowMarks is what goes into the LockRows node.) + */ + if (parse->rowMarks) + { + path = (Path *) + create_lockrows_path(root, final_rel, path, + root->rowMarks, + SS_assign_special_param(root)); + } + + /* + * If there is a LIMIT/OFFSET clause, add the LIMIT node. + */ + if (limit_needed(parse)) + { + path = (Path *) create_limit_path(root, final_rel, path, + parse->limitOffset, + parse->limitCount, + offset_est, count_est); + } + + /* + * If this is an INSERT/UPDATE/DELETE, and we're not being called + * from inheritance_planner, add the ModifyTable node. + */ + if (parse->commandType != CMD_SELECT && !inheritance_update) + { + List *withCheckOptionLists; + List *returningLists; + List *rowMarks; + + /* + * Set up the WITH CHECK OPTION and RETURNING lists-of-lists, + * if needed. + */ + if (parse->withCheckOptions) + withCheckOptionLists = list_make1(parse->withCheckOptions); + else + withCheckOptionLists = NIL; + + if (parse->returningList) + returningLists = list_make1(parse->returningList); + else + returningLists = NIL; + + /* + * If there was a FOR [KEY] UPDATE/SHARE clause, the LockRows + * node will have dealt with fetching non-locked marked rows, + * else we need to have ModifyTable do that. + */ + if (parse->rowMarks) + rowMarks = NIL; + else + rowMarks = root->rowMarks; + + path = (Path *) + create_modifytable_path(root, final_rel, + parse->commandType, + parse->canSetTag, + parse->resultRelation, + NIL, + false, + list_make1_int(parse->resultRelation), + list_make1(path), + list_make1(root), + withCheckOptionLists, + returningLists, + rowMarks, + parse->onConflict, + SS_assign_special_param(root)); + } + + /* And shove it into final_rel */ + add_path(final_rel, path); + } + } /* * Generate partial paths for final_rel, too, if outer query levels might @@ -2259,6 +2336,129 @@ grouping_planner(PlannerInfo *root, bool inheritance_update, } /* + * partitionwise_adjust_scanjoin_target + * adjusts query's targetlist for each partition in the partition tree + * whose root is 'parent' and apply it to their paths via + * apply_scanjoin_target_to_paths + * + * Its output also consists of various pieces of information that will go + * into the ModifyTable node that will be created for this query. + */ +static void +partitionwise_adjust_scanjoin_target(PlannerInfo *root, + RelOptInfo *parent, + List **subroots, + List **partitioned_rels, + List **resultRelations, + List **subpaths, + List **WCOLists, + List **returningLists, + List **rowMarks) +{ + Query *parse = root->parse; + int i; + + *partitioned_rels = lappend(*partitioned_rels, + list_make1_int(parent->relid)); + + for (i = 0; i < parent->nparts; i++) + { + RelOptInfo *child_rel = parent->part_rels[i]; + AppendRelInfo *appinfo; + int relid; + List *tlist; + PathTarget *scanjoin_target; + bool scanjoin_target_parallel_safe; + bool scanjoin_target_same_exprs; + PlannerInfo *partition_subroot; + Query *partition_parse; + + /* Ignore pruned partitions. */ + if (IS_DUMMY_REL(child_rel)) + continue; + + /* + * Extract the original relid of partition to fetch its AppendRelInfo. + * We must find it like this, because + * partitionwise_make_rel_from_joinlist replaces the original rel + * with one generated by join planning which may be different. + */ + relid = -1; + while ((relid = bms_next_member(child_rel->relids, relid)) > 0) + if (root->append_rel_array[relid] && + root->append_rel_array[relid]->parent_relid == + parent->relid) + break; + + appinfo = root->append_rel_array[relid]; + + /* Translate Query structure for this partition. */ + partition_parse = (Query *) + adjust_appendrel_attrs(root, + (Node *) parse, + 1, &appinfo); + + /* Recurse if partition is itself a partitioned table. */ + if (child_rel->part_scheme) + { + root->parse = partition_parse; + partitionwise_adjust_scanjoin_target(root, child_rel, + subroots, + partitioned_rels, + resultRelations, + subpaths, + WCOLists, + returningLists, + rowMarks); + /* Restore the Query for processing the next partition. */ + root->parse = parse; + } + else + { + /* + * Generate a separate PlannerInfo for this partition. We'll need + * it when generating the ModifyTable subplan for this partition. + */ + partition_subroot = makeNode(PlannerInfo); + *subroots = lappend(*subroots, partition_subroot); + memcpy(partition_subroot, root, sizeof(PlannerInfo)); + partition_subroot->parse = partition_parse; + + /* + * Preprocess the translated targetlist and save it in the + * partition's PlannerInfo for the perusal of later planning + * steps. + */ + tlist = preprocess_targetlist(partition_subroot); + partition_subroot->processed_tlist = tlist; + + /* Apply scan/join target. */ + scanjoin_target = create_pathtarget(root, tlist); + scanjoin_target_same_exprs = equal(scanjoin_target->exprs, + child_rel->reltarget->exprs); + scanjoin_target_parallel_safe = + is_parallel_safe(root, (Node *) scanjoin_target->exprs); + apply_scanjoin_target_to_paths(root, child_rel, + list_make1(scanjoin_target), + NIL, + scanjoin_target_parallel_safe, + scanjoin_target_same_exprs); + + /* Collect information that will go into the ModifyTable */ + *resultRelations = lappend_int(*resultRelations, relid); + *subpaths = lappend(*subpaths, child_rel->cheapest_total_path); + if (partition_parse->withCheckOptions) + *WCOLists = lappend(*WCOLists, partition_parse->withCheckOptions); + if (partition_parse->returningList) + *returningLists = lappend(*returningLists, + partition_parse->returningList); + if (partition_parse->rowMarks) + *rowMarks = lappend(*rowMarks, partition_parse->rowMarks); + } + } +} + +/* * Do preprocessing for groupingSets clause and related data. This handles the * preliminary steps of expanding the grouping sets, organizing them into lists * of rollups, and preparing annotations which will later be filled in with @@ -6964,7 +7164,9 @@ apply_scanjoin_target_to_paths(PlannerInfo *root, } /* Build new paths for this relation by appending child paths. */ - if (live_children != NIL) + if (live_children != NIL && + !(rel->reloptkind == RELOPT_BASEREL && + rel->relid == root->parse->resultRelation)) add_paths_to_append_rel(root, rel, live_children); } diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c index 690b6bbab7..f4c485cdc9 100644 --- a/src/backend/optimizer/prep/prepunion.c +++ b/src/backend/optimizer/prep/prepunion.c @@ -2265,8 +2265,34 @@ adjust_appendrel_attrs_mutator(Node *node, context->appinfos); return (Node *) phv; } + + if (IsA(node, SpecialJoinInfo)) + { + SpecialJoinInfo *oldinfo = (SpecialJoinInfo *) node; + SpecialJoinInfo *newinfo = makeNode(SpecialJoinInfo); + + memcpy(newinfo, oldinfo, sizeof(SpecialJoinInfo)); + newinfo->min_lefthand = adjust_child_relids(oldinfo->min_lefthand, + context->nappinfos, + context->appinfos); + newinfo->min_righthand = adjust_child_relids(oldinfo->min_righthand, + context->nappinfos, + context->appinfos); + newinfo->syn_lefthand = adjust_child_relids(oldinfo->syn_lefthand, + context->nappinfos, + context->appinfos); + newinfo->syn_righthand = adjust_child_relids(oldinfo->syn_righthand, + context->nappinfos, + context->appinfos); + newinfo->semi_rhs_exprs = + (List *) expression_tree_mutator((Node *) + oldinfo->semi_rhs_exprs, + adjust_appendrel_attrs_mutator, + (void *) context); + return (Node *) newinfo; + } + /* Shouldn't need to handle planner auxiliary nodes here */ - Assert(!IsA(node, SpecialJoinInfo)); Assert(!IsA(node, AppendRelInfo)); Assert(!IsA(node, PlaceHolderInfo)); Assert(!IsA(node, MinMaxAggInfo)); diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c index 8369e3ad62..8d67f21f42 100644 --- a/src/backend/optimizer/util/plancat.c +++ b/src/backend/optimizer/util/plancat.c @@ -1265,36 +1265,6 @@ get_relation_constraints(PlannerInfo *root, } } - /* - * Append partition predicates, if any. - * - * For selects, partition pruning uses the parent table's partition bound - * descriptor, instead of constraint exclusion which is driven by the - * individual partition's partition constraint. - */ - if (enable_partition_pruning && root->parse->commandType != CMD_SELECT) - { - List *pcqual = RelationGetPartitionQual(relation); - - if (pcqual) - { - /* - * Run the partition quals through const-simplification similar to - * check constraints. We skip canonicalize_qual, though, because - * partition quals should be in canonical form already; also, - * since the qual is in implicit-AND format, we'd have to - * explicitly convert it to explicit-AND format and back again. - */ - pcqual = (List *) eval_const_expressions(root, (Node *) pcqual); - - /* Fix Vars to have the desired varno */ - if (varno != 1) - ChangeVarNodes((Node *) pcqual, 1, varno, 0); - - result = list_concat(result, pcqual); - } - } - heap_close(relation, NoLock); return result; diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out index 7d04d12c6e..9074182512 100644 --- a/src/test/regress/expected/partition_join.out +++ b/src/test/regress/expected/partition_join.out @@ -1752,7 +1752,7 @@ WHERE EXISTS ( Filter: (c IS NULL) -> Nested Loop -> Seq Scan on int4_tbl - -> Subquery Scan on ss_1 + -> Subquery Scan on ss -> Limit -> Seq Scan on int8_tbl int8_tbl_1 -> Nested Loop Semi Join @@ -1760,7 +1760,7 @@ WHERE EXISTS ( Filter: (c IS NULL) -> Nested Loop -> Seq Scan on int4_tbl - -> Subquery Scan on ss_2 + -> Subquery Scan on ss -> Limit -> Seq Scan on int8_tbl int8_tbl_2 (28 rows) -- 2.11.0
From bed30ca4b5ddd258a7593d24aeffd7db2a6e70c9 Mon Sep 17 00:00:00 2001 From: amit <amitlangot...@gmail.com> Date: Wed, 16 May 2018 14:35:40 +0900 Subject: [PATCH 2/3] Lazy creation of partition objects for planning With the current approach, *all* partitions are opened and range table entries are created for them in the planner's prep phase, which is much sooner than when partition pruning is performed. This means that query_planner ends up spending cycles and memory on many partitions that potentially won't be included in the plan, such as creating RelOptInfos, AppendRelInfos. To avoid that, add partition range table entries and other planning data structures for only partitions that remain after applying partition pruning. Some code like that of partitionwise join rely on the fact that even though partitions may have been pruned, they would still have a RelOptInfo, albeit marked dummy to handle the outer join case where the pruned partition appears on the nullable side of join. So this commit also teaches the partitionwise join code to allocate dummy RelOptInfos for pruned partitions. There are couple of regression test diffs caused by the fact that we no longer allocate a duplicate RT entry for a partitioned table in its role as child and also that the individual partition RT entries are now created in the order in which their parent's are processed whereas previously they'd be added to the range table in the order of depth-first expansion of the tree. --- src/backend/optimizer/path/allpaths.c | 60 +++-- src/backend/optimizer/path/joinrels.c | 5 + src/backend/optimizer/plan/initsplan.c | 60 +++++ src/backend/optimizer/plan/planmain.c | 30 --- src/backend/optimizer/plan/planner.c | 8 +- src/backend/optimizer/prep/prepunion.c | 314 +++++++++------------- src/backend/optimizer/util/plancat.c | 12 +- src/backend/optimizer/util/relnode.c | 169 ++++++++++-- src/backend/partitioning/partprune.c | 100 ++++--- src/include/nodes/relation.h | 4 + src/include/optimizer/pathnode.h | 6 + src/include/optimizer/plancat.h | 2 +- src/include/optimizer/planmain.h | 3 + src/include/optimizer/prep.h | 10 + src/include/partitioning/partprune.h | 2 +- src/test/regress/expected/join.out | 22 +- src/test/regress/expected/partition_aggregate.out | 4 +- 17 files changed, 486 insertions(+), 325 deletions(-) diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c index 5937c0436a..d6d1e26209 100644 --- a/src/backend/optimizer/path/allpaths.c +++ b/src/backend/optimizer/path/allpaths.c @@ -151,6 +151,7 @@ make_one_rel(PlannerInfo *root, List *joinlist) { RelOptInfo *rel; Index rti; + double total_pages; /* * Construct the all_baserels Relids set. @@ -181,6 +182,35 @@ make_one_rel(PlannerInfo *root, List *joinlist) * then generate access paths. */ set_base_rel_sizes(root); + + /* + * We should now have size estimates for every actual table involved in + * the query, and we also know which if any have been deleted from the + * query by join removal; so we can compute total_table_pages. + * + * Note that appendrels are not double-counted here, even though we don't + * bother to distinguish RelOptInfos for appendrel parents, because the + * parents will still have size zero. + * + * XXX if a table is self-joined, we will count it once per appearance, + * which perhaps is the wrong thing ... but that's not completely clear, + * and detecting self-joins here is difficult, so ignore it for now. + */ + total_pages = 0; + for (rti = 1; rti < root->simple_rel_array_size; rti++) + { + RelOptInfo *brel = root->simple_rel_array[rti]; + + if (brel == NULL) + continue; + + Assert(brel->relid == rti); /* sanity check on array */ + + if (IS_SIMPLE_REL(brel)) + total_pages += (double) brel->pages; + } + root->total_table_pages = total_pages; + set_base_rel_pathlists(root); /* @@ -896,8 +926,6 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel, double *parent_attrsizes; int nattrs; ListCell *l; - Relids live_children = NULL; - bool did_pruning = false; /* Guard against stack overflow due to overly deep inheritance tree. */ check_stack_depth(); @@ -913,21 +941,14 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel, * partitioned table's list will contain all such indexes. */ if (rte->relkind == RELKIND_PARTITIONED_TABLE) + { rel->partitioned_child_rels = list_make1_int(rti); - /* - * If the partitioned relation has any baserestrictinfo quals then we - * attempt to use these quals to prune away partitions that cannot - * possibly contain any tuples matching these quals. In this case we'll - * store the relids of all partitions which could possibly contain a - * matching tuple, and skip anything else in the loop below. - */ - if (enable_partition_pruning && - rte->relkind == RELKIND_PARTITIONED_TABLE && - rel->baserestrictinfo != NIL) - { - live_children = prune_append_rel_partitions(rel); - did_pruning = true; + /* + * And do prunin. Note that this adds AppendRelInfo's of only the + * partitions that are not pruned. + */ + prune_append_rel_partitions(root, rel); } /* @@ -1178,13 +1199,6 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel, continue; } - if (did_pruning && !bms_is_member(appinfo->child_relid, live_children)) - { - /* This partition was pruned; skip it. */ - set_dummy_rel_pathlist(childrel); - continue; - } - if (relation_excluded_by_constraints(root, childrel, childRTE)) { /* @@ -2637,7 +2651,7 @@ partitionwise_make_rel_from_joinlist(PlannerInfo *root, List *saved_join_info_list = list_copy(root->join_info_list); /* Ignore pruned partitions. */ - if (IS_DUMMY_REL(partrel)) + if (partrel == NULL || IS_DUMMY_REL(partrel)) continue; /* diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c index 7008e1318e..af9c4ac8fd 100644 --- a/src/backend/optimizer/path/joinrels.c +++ b/src/backend/optimizer/path/joinrels.c @@ -1369,6 +1369,11 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2, AppendRelInfo **appinfos; int nappinfos; + if (child_rel1 == NULL) + child_rel1 = build_dummy_partition_rel(root, rel1, cnt_parts); + if (child_rel2 == NULL) + child_rel2 = build_dummy_partition_rel(root, rel2, cnt_parts); + /* We should never try to join two overlapping sets of rels. */ Assert(!bms_overlap(child_rel1->relids, child_rel2->relids)); child_joinrelids = bms_union(child_rel1->relids, child_rel2->relids); diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c index 01335db511..d85f782d50 100644 --- a/src/backend/optimizer/plan/initsplan.c +++ b/src/backend/optimizer/plan/initsplan.c @@ -132,6 +132,66 @@ add_base_rels_to_query(PlannerInfo *root, Node *jtnode) (int) nodeTag(jtnode)); } +/* + * add_rel_partitions_to_query + * create range table entries and "otherrel" RelOptInfos and for the + * partitions of 'rel' specified by the caller + * + * To store the objects thus created, various arrays in 'root' are expanded + * by repalloc'ing them. + */ +void +add_rel_partitions_to_query(PlannerInfo *root, RelOptInfo *rel, + bool scan_all_parts, + Bitmapset *partindexes) +{ + int new_size; + int num_added_parts; + int i; + + Assert(partindexes != NULL || scan_all_parts); + + /* Expand the PlannerInfo arrays to hold new partition objects. */ + num_added_parts = scan_all_parts ? rel->nparts : + bms_num_members(partindexes); + new_size = root->simple_rel_array_size + num_added_parts; + root->simple_rte_array = (RangeTblEntry **) + repalloc(root->simple_rte_array, + sizeof(RangeTblEntry *) * new_size); + root->simple_rel_array = (RelOptInfo **) + repalloc(root->simple_rel_array, + sizeof(RelOptInfo *) * new_size); + if (root->append_rel_array) + root->append_rel_array = (AppendRelInfo **) + repalloc(root->append_rel_array, + sizeof(AppendRelInfo *) * new_size); + else + root->append_rel_array = (AppendRelInfo **) + palloc0(sizeof(AppendRelInfo *) * + new_size); + + /* Set the contents of just allocated memory to 0. */ + MemSet(root->simple_rte_array + root->simple_rel_array_size, + 0, sizeof(RangeTblEntry *) * num_added_parts); + MemSet(root->simple_rel_array + root->simple_rel_array_size, + 0, sizeof(RelOptInfo *) * num_added_parts); + MemSet(root->append_rel_array + root->simple_rel_array_size, + 0, sizeof(AppendRelInfo *) * num_added_parts); + root->simple_rel_array_size = new_size; + + /* And add the partitions. */ + if (scan_all_parts) + for (i = 0; i < rel->nparts; i++) + rel->part_rels[i] = build_partition_rel(root, rel, + rel->part_oids[i]); + else + { + i = -1; + while ((i = bms_next_member(partindexes, i)) >= 0) + rel->part_rels[i] = build_partition_rel(root, rel, + rel->part_oids[i]); + } +} /***************************************************************************** * diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c index 3f0d80eaa6..1bd3f0e350 100644 --- a/src/backend/optimizer/plan/planmain.c +++ b/src/backend/optimizer/plan/planmain.c @@ -57,8 +57,6 @@ query_planner(PlannerInfo *root, List *tlist, Query *parse = root->parse; List *joinlist; RelOptInfo *final_rel; - Index rti; - double total_pages; /* * If the query has an empty join tree, then it's something easy like @@ -232,34 +230,6 @@ query_planner(PlannerInfo *root, List *tlist, extract_restriction_or_clauses(root); /* - * We should now have size estimates for every actual table involved in - * the query, and we also know which if any have been deleted from the - * query by join removal; so we can compute total_table_pages. - * - * Note that appendrels are not double-counted here, even though we don't - * bother to distinguish RelOptInfos for appendrel parents, because the - * parents will still have size zero. - * - * XXX if a table is self-joined, we will count it once per appearance, - * which perhaps is the wrong thing ... but that's not completely clear, - * and detecting self-joins here is difficult, so ignore it for now. - */ - total_pages = 0; - for (rti = 1; rti < root->simple_rel_array_size; rti++) - { - RelOptInfo *brel = root->simple_rel_array[rti]; - - if (brel == NULL) - continue; - - Assert(brel->relid == rti); /* sanity check on array */ - - if (IS_SIMPLE_REL(brel)) - total_pages += (double) brel->pages; - } - root->total_table_pages = total_pages; - - /* * Ready to do the primary planning. */ final_rel = make_one_rel(root, joinlist); diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c index 076dbd3d62..88db46a6e5 100644 --- a/src/backend/optimizer/plan/planner.c +++ b/src/backend/optimizer/plan/planner.c @@ -2374,7 +2374,7 @@ partitionwise_adjust_scanjoin_target(PlannerInfo *root, Query *partition_parse; /* Ignore pruned partitions. */ - if (IS_DUMMY_REL(child_rel)) + if (child_rel == NULL || IS_DUMMY_REL(child_rel)) continue; /* @@ -7134,6 +7134,9 @@ apply_scanjoin_target_to_paths(PlannerInfo *root, int nappinfos; List *child_scanjoin_targets = NIL; + if (child_rel == NULL) + continue; + /* Translate scan/join targets for this child. */ appinfos = find_appinfos_by_relids(root, child_rel->relids, &nappinfos); @@ -7237,6 +7240,9 @@ create_partitionwise_grouping_paths(PlannerInfo *root, RelOptInfo *child_grouped_rel; RelOptInfo *child_partially_grouped_rel; + if (child_input_rel == NULL) + continue; + /* Input child rel must have a path */ Assert(child_input_rel->pathlist != NIL); diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c index f4c485cdc9..279f686fb0 100644 --- a/src/backend/optimizer/prep/prepunion.c +++ b/src/backend/optimizer/prep/prepunion.c @@ -49,6 +49,8 @@ #include "parser/parse_coerce.h" #include "parser/parsetree.h" #include "utils/lsyscache.h" +#include "utils/lsyscache.h" +#include "utils/partcache.h" #include "utils/rel.h" #include "utils/selfuncs.h" #include "utils/syscache.h" @@ -101,21 +103,10 @@ static List *generate_append_tlist(List *colTypes, List *colCollations, static List *generate_setop_grouplist(SetOperationStmt *op, List *targetlist); static void expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti); -static void expand_partitioned_rtentry(PlannerInfo *root, - RangeTblEntry *parentrte, - Index parentRTindex, Relation parentrel, - PlanRowMark *top_parentrc, LOCKMODE lockmode, - List **appinfos); -static void expand_single_inheritance_child(PlannerInfo *root, - RangeTblEntry *parentrte, - Index parentRTindex, Relation parentrel, - PlanRowMark *top_parentrc, Relation childrel, - List **appinfos, RangeTblEntry **childrte_p, - Index *childRTindex_p); -static void make_inh_translation_list(Relation oldrelation, - Relation newrelation, - Index newvarno, - List **translated_vars); +static void make_inh_translation_list(TupleDesc old_tupdesc, + TupleDesc new_tupdesc, + RangeTblEntry *oldrte, RangeTblEntry *newrte, + Index newvarno, List **translated_vars); static Bitmapset *translate_col_privs(const Bitmapset *parent_privs, List *translated_vars); static Node *adjust_appendrel_attrs_mutator(Node *node, @@ -1522,6 +1513,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti) LOCKMODE lockmode; List *inhOIDs; ListCell *l; + List *appinfos = NIL; /* Does RT entry allow inheritance? */ if (!rte->inh) @@ -1585,173 +1577,58 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti) if (oldrc) oldrc->isParent = true; + /* Partitioned tables are expanded elsewhere. */ + if (rte->relkind == RELKIND_PARTITIONED_TABLE) + { + list_free(inhOIDs); + return; + } + /* * Must open the parent relation to examine its tupdesc. We need not lock * it; we assume the rewriter already did. */ oldrelation = heap_open(parentOID, NoLock); - /* Scan the inheritance set and expand it */ - if (RelationGetPartitionDesc(oldrelation) != NULL) + foreach(l, inhOIDs) { - Assert(rte->relkind == RELKIND_PARTITIONED_TABLE); + Oid childOID = lfirst_oid(l); + Index childRTindex = 0; + RangeTblEntry *childrte = NULL; + AppendRelInfo *appinfo = NULL; - /* - * If this table has partitions, recursively expand them in the order - * in which they appear in the PartitionDesc. While at it, also - * extract the partition key columns of all the partitioned tables. - */ - expand_partitioned_rtentry(root, rte, rti, oldrelation, oldrc, - lockmode, &root->append_rel_list); + add_inheritance_child_to_query(root, rte, rti, + oldrelation->rd_rel->reltype, + RelationGetDescr(oldrelation), + oldrc, childOID, NoLock, + &appinfo, &childrte, + &childRTindex); + Assert(childRTindex > 1); + Assert(childrte != NULL); + Assert(appinfo != NULL); + appinfos = lappend(appinfos, appinfo); } + + /* + * If all the children were temp tables, pretend it's a + * non-inheritance situation; we don't need Append node in that case. + * The duplicate RTE we added for the parent table is harmless, so we + * don't bother to get rid of it; ditto for the useless PlanRowMark + * node. + */ + if (list_length(appinfos) < 2) + rte->inh = false; else - { - List *appinfos = NIL; - RangeTblEntry *childrte; - Index childRTindex; - - /* - * This table has no partitions. Expand any plain inheritance - * children in the order the OIDs were returned by - * find_all_inheritors. - */ - foreach(l, inhOIDs) - { - Oid childOID = lfirst_oid(l); - Relation newrelation; - - /* Open rel if needed; we already have required locks */ - if (childOID != parentOID) - newrelation = heap_open(childOID, NoLock); - else - newrelation = oldrelation; - - /* - * It is possible that the parent table has children that are temp - * tables of other backends. We cannot safely access such tables - * (because of buffering issues), and the best thing to do seems - * to be to silently ignore them. - */ - if (childOID != parentOID && RELATION_IS_OTHER_TEMP(newrelation)) - { - heap_close(newrelation, lockmode); - continue; - } - - expand_single_inheritance_child(root, rte, rti, oldrelation, oldrc, - newrelation, - &appinfos, &childrte, - &childRTindex); - - /* Close child relations, but keep locks */ - if (childOID != parentOID) - heap_close(newrelation, NoLock); - } - - /* - * If all the children were temp tables, pretend it's a - * non-inheritance situation; we don't need Append node in that case. - * The duplicate RTE we added for the parent table is harmless, so we - * don't bother to get rid of it; ditto for the useless PlanRowMark - * node. - */ - if (list_length(appinfos) < 2) - rte->inh = false; - else - root->append_rel_list = list_concat(root->append_rel_list, - appinfos); - - } + root->append_rel_list = list_concat(root->append_rel_list, + appinfos); heap_close(oldrelation, NoLock); } /* - * expand_partitioned_rtentry - * Recursively expand an RTE for a partitioned table. - * - * Note that RelationGetPartitionDispatchInfo will expand partitions in the - * same order as this code. - */ -static void -expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte, - Index parentRTindex, Relation parentrel, - PlanRowMark *top_parentrc, LOCKMODE lockmode, - List **appinfos) -{ - int i; - RangeTblEntry *childrte; - Index childRTindex; - PartitionDesc partdesc = RelationGetPartitionDesc(parentrel); - - check_stack_depth(); - - /* A partitioned table should always have a partition descriptor. */ - Assert(partdesc); - - Assert(parentrte->inh); - - /* - * Note down whether any partition key cols are being updated. Though it's - * the root partitioned table's updatedCols we are interested in, we - * instead use parentrte to get the updatedCols. This is convenient - * because parentrte already has the root partrel's updatedCols translated - * to match the attribute ordering of parentrel. - */ - if (!root->partColsUpdated) - root->partColsUpdated = - has_partition_attrs(parentrel, parentrte->updatedCols, NULL); - - /* First expand the partitioned table itself. */ - expand_single_inheritance_child(root, parentrte, parentRTindex, parentrel, - top_parentrc, parentrel, - appinfos, &childrte, &childRTindex); - - /* - * If the partitioned table has no partitions, treat this as the - * non-inheritance case. - */ - if (partdesc->nparts == 0) - { - parentrte->inh = false; - return; - } - - for (i = 0; i < partdesc->nparts; i++) - { - Oid childOID = partdesc->oids[i]; - Relation childrel; - - /* Open rel; we already have required locks */ - childrel = heap_open(childOID, NoLock); - - /* - * Temporary partitions belonging to other sessions should have been - * disallowed at definition, but for paranoia's sake, let's double - * check. - */ - if (RELATION_IS_OTHER_TEMP(childrel)) - elog(ERROR, "temporary relation from another session found as partition"); - - expand_single_inheritance_child(root, parentrte, parentRTindex, - parentrel, top_parentrc, childrel, - appinfos, &childrte, &childRTindex); - - /* If this child is itself partitioned, recurse */ - if (childrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) - expand_partitioned_rtentry(root, childrte, childRTindex, - childrel, top_parentrc, lockmode, - appinfos); - - /* Close child relation, but keep locks */ - heap_close(childrel, NoLock); - } -} - -/* - * expand_single_inheritance_child + * add_inheritance_child_to_query * Build a RangeTblEntry and an AppendRelInfo, if appropriate, plus - * maybe a PlanRowMark. + * maybe a PlanRowMark for a child relation. * * We now expand the partition hierarchy level by level, creating a * corresponding hierarchy of AppendRelInfos and RelOptInfos, where each @@ -1769,19 +1646,70 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte, * The child RangeTblEntry and its RTI are returned in "childrte_p" and * "childRTindex_p" resp. */ -static void -expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte, - Index parentRTindex, Relation parentrel, - PlanRowMark *top_parentrc, Relation childrel, - List **appinfos, RangeTblEntry **childrte_p, - Index *childRTindex_p) +void +add_inheritance_child_to_query(PlannerInfo *root, RangeTblEntry *parentrte, + Index parentRTindex, Oid parentRelType, + TupleDesc parentDesc, + PlanRowMark *top_parentrc, + Oid childOID, int lockmode, + AppendRelInfo **appinfo_p, + RangeTblEntry **childrte_p, + Index *childRTindex_p) { Query *parse = root->parse; - Oid parentOID = RelationGetRelid(parentrel); - Oid childOID = RelationGetRelid(childrel); + Oid parentOID = parentrte->relid; RangeTblEntry *childrte; Index childRTindex; AppendRelInfo *appinfo; + Relation childrel = NULL; + char child_relkind; + Oid child_reltype; + TupleDesc childDesc; + + *appinfo_p = NULL; + *childrte_p = NULL; + *childRTindex_p = 0; + + /* Open rel if needed; we already have required locks */ + if (childOID != parentOID) + { + childrel = heap_open(childOID, lockmode); + + /* + * Temporary partitions belonging to other sessions should have been + * disallowed at definition, but for paranoia's sake, let's double + * check. + */ + if (RELATION_IS_OTHER_TEMP(childrel)) + { + if (childrel->rd_rel->relispartition) + elog(ERROR, "temporary relation from another session found as partition"); + heap_close(childrel, lockmode); + return; + } + + child_relkind = childrel->rd_rel->relkind; + + /* + * No point in adding to the query a partitioned table that has no + * partitions. + */ + if (child_relkind == RELKIND_PARTITIONED_TABLE && + RelationGetPartitionDesc(childrel)->nparts == 0) + { + heap_close(childrel, lockmode); + return; + } + + child_reltype = childrel->rd_rel->reltype; + childDesc = RelationGetDescr(childrel); + } + else + { + child_relkind = parentrte->relkind; + child_reltype = parentRelType; + childDesc = parentDesc; + } /* * Build an RTE for the child, and attach to query's rangetable list. We @@ -1798,7 +1726,7 @@ expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte, childrte = copyObject(parentrte); *childrte_p = childrte; childrte->relid = childOID; - childrte->relkind = childrel->rd_rel->relkind; + childrte->relkind = child_relkind; /* A partitioned child will need to be expanded further. */ if (childOID != parentOID && childrte->relkind == RELKIND_PARTITIONED_TABLE) @@ -1823,12 +1751,13 @@ expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte, appinfo = makeNode(AppendRelInfo); appinfo->parent_relid = parentRTindex; appinfo->child_relid = childRTindex; - appinfo->parent_reltype = parentrel->rd_rel->reltype; - appinfo->child_reltype = childrel->rd_rel->reltype; - make_inh_translation_list(parentrel, childrel, childRTindex, + appinfo->parent_reltype = parentRelType; + appinfo->child_reltype = child_reltype; + make_inh_translation_list(parentDesc, childDesc, + parentrte, childrte, childRTindex, &appinfo->translated_vars); appinfo->parent_reloid = parentOID; - *appinfos = lappend(*appinfos, appinfo); + *appinfo_p = appinfo; /* * Translate the column permissions bitmaps to the child's attnums (we @@ -1879,6 +1808,13 @@ expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte, root->rowMarks = lappend(root->rowMarks, childrc); } + + /* Close child relations, but keep locks */ + if (childOID != parentOID) + { + Assert(childrel != NULL); + heap_close(childrel, lockmode); + } } /* @@ -1889,14 +1825,12 @@ expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte, * For paranoia's sake, we match type/collation as well as attribute name. */ static void -make_inh_translation_list(Relation oldrelation, Relation newrelation, - Index newvarno, - List **translated_vars) +make_inh_translation_list(TupleDesc old_tupdesc, TupleDesc new_tupdesc, + RangeTblEntry *oldrte, RangeTblEntry *newrte, + Index newvarno, List **translated_vars) { List *vars = NIL; - TupleDesc old_tupdesc = RelationGetDescr(oldrelation); - TupleDesc new_tupdesc = RelationGetDescr(newrelation); - Oid new_relid = RelationGetRelid(newrelation); + Oid new_relid = newrte->relid; int oldnatts = old_tupdesc->natts; int newnatts = new_tupdesc->natts; int old_attno; @@ -1926,7 +1860,7 @@ make_inh_translation_list(Relation oldrelation, Relation newrelation, * When we are generating the "translation list" for the parent table * of an inheritance set, no need to search for matches. */ - if (oldrelation == newrelation) + if (oldrte->relid == newrte->relid) { vars = lappend(vars, makeVar(newvarno, (AttrNumber) (old_attno + 1), @@ -1955,7 +1889,7 @@ make_inh_translation_list(Relation oldrelation, Relation newrelation, newtup = SearchSysCacheAttName(new_relid, attname); if (!newtup) elog(ERROR, "could not find inherited attribute \"%s\" of relation \"%s\"", - attname, RelationGetRelationName(newrelation)); + attname, get_rel_name(newrte->relid)); new_attno = ((Form_pg_attribute) GETSTRUCT(newtup))->attnum - 1; ReleaseSysCache(newtup); @@ -1965,10 +1899,10 @@ make_inh_translation_list(Relation oldrelation, Relation newrelation, /* Found it, check type and collation match */ if (atttypid != att->atttypid || atttypmod != att->atttypmod) elog(ERROR, "attribute \"%s\" of relation \"%s\" does not match parent's type", - attname, RelationGetRelationName(newrelation)); + attname, get_rel_name(newrte->relid)); if (attcollation != att->attcollation) elog(ERROR, "attribute \"%s\" of relation \"%s\" does not match parent's collation", - attname, RelationGetRelationName(newrelation)); + attname, get_rel_name(newrte->relid)); vars = lappend(vars, makeVar(newvarno, (AttrNumber) (new_attno + 1), @@ -2121,7 +2055,7 @@ adjust_appendrel_attrs_mutator(Node *node, } } - if (var->varlevelsup == 0 && appinfo) + if (var->varlevelsup == 0 && appinfo && appinfo->translated_vars) { var->varno = appinfo->child_relid; var->varnoold = appinfo->child_relid; diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c index 8d67f21f42..100dfd8e0c 100644 --- a/src/backend/optimizer/util/plancat.c +++ b/src/backend/optimizer/util/plancat.c @@ -106,7 +106,7 @@ static void set_baserel_partition_key_exprs(Relation relation, */ void get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent, - RelOptInfo *rel) + Bitmapset *updatedCols, RelOptInfo *rel) { Index varno = rel->relid; Relation relation; @@ -449,7 +449,15 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent, * inheritance parents may be partitioned. */ if (inhparent && relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) + { set_relation_partition_info(root, rel, relation); + if (!root->partColsUpdated) + root->partColsUpdated = + has_partition_attrs(relation, updatedCols, NULL); + } + + rel->tupdesc = RelationGetDescr(relation); + rel->reltype = RelationGetForm(relation)->reltype; heap_close(relation, NoLock); @@ -1883,6 +1891,8 @@ set_relation_partition_info(PlannerInfo *root, RelOptInfo *rel, rel->nparts = partdesc->nparts; set_baserel_partition_key_exprs(relation, rel); rel->partition_qual = RelationGetPartitionQual(relation); + rel->part_oids = (Oid *) palloc(rel->nparts * sizeof(Oid)); + memcpy(rel->part_oids, partdesc->oids, rel->nparts * sizeof(Oid)); } /* diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c index c69740eda6..b267f07c18 100644 --- a/src/backend/optimizer/util/relnode.c +++ b/src/backend/optimizer/util/relnode.c @@ -16,6 +16,7 @@ #include <limits.h> +#include "catalog/pg_class.h" #include "miscadmin.h" #include "optimizer/clauses.h" #include "optimizer/cost.h" @@ -27,6 +28,7 @@ #include "optimizer/restrictinfo.h" #include "optimizer/tlist.h" #include "partitioning/partbounds.h" +#include "storage/lockdefs.h" #include "utils/hsearch.h" @@ -137,6 +139,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent) /* Rel should not exist already */ Assert(relid > 0 && relid < root->simple_rel_array_size); + if (root->simple_rel_array[relid] != NULL) elog(ERROR, "rel %d already exists", relid); @@ -218,7 +221,8 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent) { case RTE_RELATION: /* Table --- retrieve statistics from the system catalogs */ - get_relation_info(root, rte->relid, rte->inh, rel); + get_relation_info(root, rte->relid, rte->inh, rte->updatedCols, + rel); break; case RTE_SUBQUERY: case RTE_FUNCTION: @@ -268,41 +272,30 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent) if (rte->inh) { ListCell *l; - int nparts = rel->nparts; - int cnt_parts = 0; - if (nparts > 0) + /* + * For partitioned tables, we just allocate space for RelOptInfo's. + * pointers for all partitions and copy the partition OIDs from the + * relcache. Actual RelOptInfo is built for a partition only if it is + * not pruned. + */ + if (rte->relkind == RELKIND_PARTITIONED_TABLE) + { rel->part_rels = (RelOptInfo **) - palloc(sizeof(RelOptInfo *) * nparts); + palloc0(sizeof(RelOptInfo *) * rel->nparts); + return rel; + } foreach(l, root->append_rel_list) { AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l); - RelOptInfo *childrel; /* append_rel_list contains all append rels; ignore others */ if (appinfo->parent_relid != relid) continue; - childrel = build_simple_rel(root, appinfo->child_relid, - rel); - - /* Nothing more to do for an unpartitioned table. */ - if (!rel->part_scheme) - continue; - - /* - * The order of partition OIDs in append_rel_list is the same as - * the order in the PartitionDesc, so the order of part_rels will - * also match the PartitionDesc. See expand_partitioned_rtentry. - */ - Assert(cnt_parts < nparts); - rel->part_rels[cnt_parts] = childrel; - cnt_parts++; + (void) build_simple_rel(root, appinfo->child_relid, rel); } - - /* We should have seen all the child partitions. */ - Assert(cnt_parts == nparts); } return rel; @@ -1768,3 +1761,131 @@ build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel, joinrel->nullable_partexprs[cnt] = nullable_partexpr; } } + +/* + * build_dummy_partition_rel + * Build a RelOptInfo and AppendRelInfo for a pruned partition + * + * This does not result in opening the relation or a range table entry being + * created. Also, the RelOptInfo thus created is not stored anywhere else + * beside the parent's part_rels array. + * + * The only reason this exists is because partition-wise join, in some cases, + * needs a RelOptInfo to represent an empty relation that's on the nullable + * side of an outer join, so that a Path representing the outer join can be + * created. + */ +RelOptInfo * +build_dummy_partition_rel(PlannerInfo *root, RelOptInfo *parent, int partidx) +{ + RelOptInfo *rel; + + Assert(parent->part_rels[partidx] == NULL); + + /* Create minimally valid-looking RelOptInfo with parent's relid. */ + rel = makeNode(RelOptInfo); + rel->reloptkind = RELOPT_OTHER_MEMBER_REL; + rel->relid = parent->relid; + rel->relids = bms_copy(parent->relids); + if (parent->top_parent_relids) + rel->top_parent_relids = parent->top_parent_relids; + else + rel->top_parent_relids = bms_copy(parent->relids); + rel->reltarget = copy_pathtarget(parent->reltarget); + parent->part_rels[partidx] = rel; + mark_dummy_rel(rel); + + /* + * Now we'll need a (noop) AppendRelInfo for parent, because we're setting + * the dummy partition's relid to be same as the parent's. + */ + if (root->append_rel_array[parent->relid] == NULL) + { + AppendRelInfo *appinfo = makeNode(AppendRelInfo); + + appinfo->parent_relid = parent->relid; + appinfo->child_relid = parent->relid; + appinfo->parent_reltype = parent->reltype; + appinfo->child_reltype = parent->reltype; + /* leaving translated_vars to NIL to mean no translation needed */ + appinfo->parent_reloid = root->simple_rte_array[parent->relid]->relid; + root->append_rel_array[parent->relid] = appinfo; + } + + return rel; +} + +/* + * build_partition_rel + * This adds a valid partition to the query by adding it to the + * range table and creating planner data structures for it + */ +RelOptInfo * +build_partition_rel(PlannerInfo *root, RelOptInfo *parent, Oid partoid) +{ + RangeTblEntry *parentrte = root->simple_rte_array[parent->relid]; + RelOptInfo *result; + Index partRTindex = 0; + RangeTblEntry *partrte = NULL; + AppendRelInfo *appinfo = NULL; + PlanRowMark *rootrc = NULL; + + /* Locate the root partitioned table and fetch its PlanRowMark, if any. */ + if (root->rowMarks) + { + Index rootRTindex = 0; + + /* + * The root partitioned table itself might be a child of UNION ALL + * parent, so we must resort to finding the root parent like this. + */ + rootRTindex = parent->relid; + if (root->append_rel_array[rootRTindex]) + { + AppendRelInfo *tmp = root->append_rel_array[rootRTindex]; + + /* + * Keep moving up until we each the parent rel that's not a + * partitioned table. The one before that one would be the root + * parent. + */ + while(root->simple_rel_array[rootRTindex]->part_scheme) + { + tmp = root->append_rel_array[tmp->parent_relid]; + if (tmp == NULL) + break; + rootRTindex = tmp->parent_relid; + } + } + + rootrc = get_plan_rowmark(root->rowMarks, rootRTindex); + } + + /* + * expand_inherited_rtentry alreay locked all partitions, so pass + * NoLock for lockmode. + */ + add_inheritance_child_to_query(root, parentrte, parent->relid, + parent->reltype, parent->tupdesc, + rootrc, partoid, NoLock, + &appinfo, &partrte, &partRTindex); + + /* Partition turned out to be a partitioned table with 0 partitions. */ + if (partrte == NULL) + return NULL; + + Assert(appinfo != NULL); + root->append_rel_list = lappend(root->append_rel_list, appinfo); + root->simple_rte_array[partRTindex] = partrte; + root->append_rel_array[partRTindex] = appinfo; + + /* Build the RelOptInfo. */ + result = build_simple_rel(root, partRTindex, parent); + + /* Set the information created by create_lateral_join_info(). */ + result->direct_lateral_relids = parent->direct_lateral_relids; + result->lateral_relids = parent->lateral_relids; + result->lateral_referencers = parent->lateral_referencers; + + return result; +} diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c index b5c1c7d4dd..331e2717b2 100644 --- a/src/backend/partitioning/partprune.c +++ b/src/backend/partitioning/partprune.c @@ -45,7 +45,9 @@ #include "nodes/makefuncs.h" #include "nodes/nodeFuncs.h" #include "optimizer/clauses.h" +#include "optimizer/cost.h" #include "optimizer/pathnode.h" +#include "optimizer/planmain.h" #include "optimizer/planner.h" #include "optimizer/predtest.h" #include "optimizer/prep.h" @@ -443,9 +445,18 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel, for (i = 0; i < nparts; i++) { RelOptInfo *partrel = subpart->part_rels[i]; - int subplanidx = relid_subplan_map[partrel->relid] - 1; - int subpartidx = relid_subpart_map[partrel->relid] - 1; + int subplanidx; + int subpartidx; + if (partrel == NULL) + { + subplan_map[i] = -1; + subpart_map[i] = -1; + continue; + } + + subplanidx = relid_subplan_map[partrel->relid] - 1; + subpartidx = relid_subpart_map[partrel->relid] - 1; subplan_map[i] = subplanidx; subpart_map[i] = subpartidx; if (subplanidx >= 0) @@ -548,61 +559,68 @@ gen_partprune_steps(RelOptInfo *rel, List *clauses, bool *contradictory) * * Callers must ensure that 'rel' is a partitioned table. */ -Relids -prune_append_rel_partitions(RelOptInfo *rel) +void +prune_append_rel_partitions(PlannerInfo *root, RelOptInfo *rel) { - Relids result; List *clauses = rel->baserestrictinfo; List *pruning_steps; - bool contradictory; + bool contradictory, + scan_all_parts = false; PartitionPruneContext context; - Bitmapset *partindexes; - int i; + Bitmapset *partindexes = NULL; - Assert(clauses != NIL); Assert(rel->part_scheme != NULL); /* If there are no partitions, return the empty set */ if (rel->nparts == 0) - return NULL; + return; - /* - * Process clauses. If the clauses are found to be contradictory, we can - * return the empty set. - */ - pruning_steps = gen_partprune_steps(rel, clauses, &contradictory); - if (contradictory) - return NULL; - - /* Set up PartitionPruneContext */ - context.strategy = rel->part_scheme->strategy; - context.partnatts = rel->part_scheme->partnatts; - context.nparts = rel->nparts; - context.boundinfo = rel->boundinfo; - context.partcollation = rel->part_scheme->partcollation; - context.partsupfunc = rel->part_scheme->partsupfunc; - context.stepcmpfuncs = (FmgrInfo *) palloc0(sizeof(FmgrInfo) * + if (enable_partition_pruning && clauses != NIL) + { + /* + * Process clauses. If the clauses are found to be contradictory, we + * can return the empty set. + */ + pruning_steps = gen_partprune_steps(rel, clauses, &contradictory); + if (!contradictory) + { + context.strategy = rel->part_scheme->strategy; + context.partnatts = rel->part_scheme->partnatts; + context.nparts = rel->nparts; + context.boundinfo = rel->boundinfo; + context.partcollation = rel->part_scheme->partcollation; + context.partsupfunc = rel->part_scheme->partsupfunc; + context.stepcmpfuncs = (FmgrInfo *) + palloc0(sizeof(FmgrInfo) * context.partnatts * list_length(pruning_steps)); - context.ppccontext = CurrentMemoryContext; + context.ppccontext = CurrentMemoryContext; - /* These are not valid when being called from the planner */ - context.partrel = NULL; - context.planstate = NULL; - context.exprstates = NULL; - context.exprhasexecparam = NULL; - context.evalexecparams = false; + /* These are not valid when being called from the planner */ + context.partrel = NULL; + context.planstate = NULL; + context.exprstates = NULL; + context.exprhasexecparam = NULL; + context.evalexecparams = false; - /* Actual pruning happens here. */ - partindexes = get_matching_partitions(&context, pruning_steps); + /* Actual pruning happens here. */ + partindexes = get_matching_partitions(&context, pruning_steps); - /* Add selected partitions' RT indexes to result. */ - i = -1; - result = NULL; - while ((i = bms_next_member(partindexes, i)) >= 0) - result = bms_add_member(result, rel->part_rels[i]->relid); + /* No need to add partitions if all were pruned. */ + if (bms_is_empty(partindexes)) + return; + } + else + scan_all_parts = true; + } + else + scan_all_parts = true; - return result; + /* + * Build selected partitions' range table entries, RelOptInfos, and + * AppendRelInfos. + */ + add_rel_partitions_to_query(root, rel, scan_all_parts, partindexes); } /* diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h index 41caf873fb..1e8371d814 100644 --- a/src/include/nodes/relation.h +++ b/src/include/nodes/relation.h @@ -15,6 +15,7 @@ #define RELATION_H #include "access/sdir.h" +#include "access/tupdesc.h" #include "fmgr.h" #include "lib/stringinfo.h" #include "nodes/params.h" @@ -695,11 +696,14 @@ typedef struct RelOptInfo int nparts; /* number of partitions */ struct PartitionBoundInfoData *boundinfo; /* Partition bounds */ List *partition_qual; /* partition constraint */ + Oid *part_oids; /* partition OIDs */ struct RelOptInfo **part_rels; /* Array of RelOptInfos of partitions, * stored in the same order of bounds */ List **partexprs; /* Non-nullable partition key expressions. */ List **nullable_partexprs; /* Nullable partition key expressions. */ List *partitioned_child_rels; /* List of RT indexes. */ + TupleDesc tupdesc; + Oid reltype; } RelOptInfo; /* diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h index 7c5ff22650..4f567765a4 100644 --- a/src/include/optimizer/pathnode.h +++ b/src/include/optimizer/pathnode.h @@ -297,5 +297,11 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel, RelOptInfo *inner_rel, RelOptInfo *parent_joinrel, List *restrictlist, SpecialJoinInfo *sjinfo, JoinType jointype); +extern RelOptInfo *build_dummy_partition_rel(PlannerInfo *root, + RelOptInfo *parent, + int partidx); +extern RelOptInfo *build_partition_rel(PlannerInfo *root, + RelOptInfo *parent, + Oid partoid); #endif /* PATHNODE_H */ diff --git a/src/include/optimizer/plancat.h b/src/include/optimizer/plancat.h index 7d53cbbb87..edaf2a3b4f 100644 --- a/src/include/optimizer/plancat.h +++ b/src/include/optimizer/plancat.h @@ -26,7 +26,7 @@ extern PGDLLIMPORT get_relation_info_hook_type get_relation_info_hook; extern void get_relation_info(PlannerInfo *root, Oid relationObjectId, - bool inhparent, RelOptInfo *rel); + bool inhparent, Bitmapset *updatedCols, RelOptInfo *rel); extern List *infer_arbiter_indexes(PlannerInfo *root); diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h index c8ab0280d2..1916a33467 100644 --- a/src/include/optimizer/planmain.h +++ b/src/include/optimizer/planmain.h @@ -73,6 +73,9 @@ extern int from_collapse_limit; extern int join_collapse_limit; extern void add_base_rels_to_query(PlannerInfo *root, Node *jtnode); +extern void add_rel_partitions_to_query(PlannerInfo *root, RelOptInfo *rel, + bool scan_all_parts, + Bitmapset *partindexes); extern void build_base_rel_tlists(PlannerInfo *root, List *final_tlist); extern void add_vars_to_targetlist(PlannerInfo *root, List *vars, Relids where_needed, bool create_new_ph); diff --git a/src/include/optimizer/prep.h b/src/include/optimizer/prep.h index 38608770a2..ca66f75544 100644 --- a/src/include/optimizer/prep.h +++ b/src/include/optimizer/prep.h @@ -49,6 +49,16 @@ extern RelOptInfo *plan_set_operations(PlannerInfo *root); extern void expand_inherited_tables(PlannerInfo *root); +extern void add_inheritance_child_to_query(PlannerInfo *root, + RangeTblEntry *parentrte, + Index parentRTindex, Oid parentRelType, + TupleDesc parentDesc, + PlanRowMark *top_parentrc, + Oid childOID, int lockmode, + AppendRelInfo **appinfo_p, + RangeTblEntry **childrte_p, + Index *childRTindex_p); + extern Node *adjust_appendrel_attrs(PlannerInfo *root, Node *node, int nappinfos, AppendRelInfo **appinfos); diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h index b95c346bab..55a324583b 100644 --- a/src/include/partitioning/partprune.h +++ b/src/include/partitioning/partprune.h @@ -79,7 +79,7 @@ extern PartitionPruneInfo *make_partition_pruneinfo(PlannerInfo *root, List *subpaths, List *partitioned_rels, List *prunequal); -extern Relids prune_append_rel_partitions(RelOptInfo *rel); +extern void prune_append_rel_partitions(PlannerInfo *root, RelOptInfo *rel); extern Bitmapset *get_matching_partitions(PartitionPruneContext *context, List *pruning_steps); diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out index dc6262be43..5f931591a6 100644 --- a/src/test/regress/expected/join.out +++ b/src/test/regress/expected/join.out @@ -5533,29 +5533,29 @@ select t1.b, ss.phv from join_ut1 t1 left join lateral (select t2.a as t2a, t3.a t3a, least(t1.a, t2.a, t3.a) phv from join_pt1 t2 join join_ut1 t3 on t2.a = t3.b) ss on t1.a = ss.t2a order by t1.a; - QUERY PLAN ------------------------------------------------------------------- + QUERY PLAN +-------------------------------------------------------------------- Sort - Output: t1.b, (LEAST(t1.a, t2.a, t3.a)), t1.a + Output: t1.b, (LEAST(t1.a, t2_1.a, t3.a)), t1.a Sort Key: t1.a -> Nested Loop Left Join - Output: t1.b, (LEAST(t1.a, t2.a, t3.a)), t1.a + Output: t1.b, (LEAST(t1.a, t2_1.a, t3.a)), t1.a -> Seq Scan on public.join_ut1 t1 Output: t1.a, t1.b, t1.c -> Hash Join - Output: t2.a, LEAST(t1.a, t2.a, t3.a) - Hash Cond: (t3.b = t2.a) + Output: t2_1.a, LEAST(t1.a, t2_1.a, t3.a) + Hash Cond: (t3.b = t2_1.a) -> Seq Scan on public.join_ut1 t3 Output: t3.a, t3.b, t3.c -> Hash - Output: t2.a + Output: t2_1.a -> Append - -> Seq Scan on public.join_pt1p1p1 t2 - Output: t2.a - Filter: (t1.a = t2.a) - -> Seq Scan on public.join_pt1p2 t2_1 + -> Seq Scan on public.join_pt1p1p1 t2_1 Output: t2_1.a Filter: (t1.a = t2_1.a) + -> Seq Scan on public.join_pt1p2 t2 + Output: t2.a + Filter: (t1.a = t2.a) (21 rows) select t1.b, ss.phv from join_ut1 t1 left join lateral diff --git a/src/test/regress/expected/partition_aggregate.out b/src/test/regress/expected/partition_aggregate.out index d286050c9a..d1ce6ad423 100644 --- a/src/test/regress/expected/partition_aggregate.out +++ b/src/test/regress/expected/partition_aggregate.out @@ -144,7 +144,7 @@ SELECT c, sum(a) FROM pagg_tab WHERE 1 = 2 GROUP BY c; QUERY PLAN -------------------------------- HashAggregate - Group Key: pagg_tab.c + Group Key: c -> Result One-Time Filter: false (4 rows) @@ -159,7 +159,7 @@ SELECT c, sum(a) FROM pagg_tab WHERE c = 'x' GROUP BY c; QUERY PLAN -------------------------------- GroupAggregate - Group Key: pagg_tab.c + Group Key: c -> Result One-Time Filter: false (4 rows) -- 2.11.0
From 57b8cadddce13952a0a62d37c51dd02c7a436ebc Mon Sep 17 00:00:00 2001 From: amit <amitlangot...@gmail.com> Date: Thu, 23 Aug 2018 17:30:18 +0900 Subject: [PATCH 3/3] Only lock partitions that will be scanned by a query --- src/backend/optimizer/prep/prepunion.c | 8 +++----- src/backend/optimizer/util/relnode.c | 17 ++++++++++------- 2 files changed, 13 insertions(+), 12 deletions(-) diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c index 279f686fb0..6a2adb5f4d 100644 --- a/src/backend/optimizer/prep/prepunion.c +++ b/src/backend/optimizer/prep/prepunion.c @@ -1555,14 +1555,15 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti) lockmode = AccessShareLock; /* Scan for all members of inheritance set, acquire needed locks */ - inhOIDs = find_all_inheritors(parentOID, lockmode, NULL); + if (rte->relkind != RELKIND_PARTITIONED_TABLE) + inhOIDs = find_all_inheritors(parentOID, lockmode, NULL); /* * Check that there's at least one descendant, else treat as no-child * case. This could happen despite above has_subclass() check, if table * once had a child but no longer does. */ - if (list_length(inhOIDs) < 2) + if (rte->relkind != RELKIND_PARTITIONED_TABLE && list_length(inhOIDs) < 2) { /* Clear flag before returning */ rte->inh = false; @@ -1579,10 +1580,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti) /* Partitioned tables are expanded elsewhere. */ if (rte->relkind == RELKIND_PARTITIONED_TABLE) - { - list_free(inhOIDs); return; - } /* * Must open the parent relation to examine its tupdesc. We need not lock diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c index b267f07c18..f9bde0c058 100644 --- a/src/backend/optimizer/util/relnode.c +++ b/src/backend/optimizer/util/relnode.c @@ -1825,16 +1825,16 @@ build_partition_rel(PlannerInfo *root, RelOptInfo *parent, Oid partoid) { RangeTblEntry *parentrte = root->simple_rte_array[parent->relid]; RelOptInfo *result; + Index rootRTindex = 0; Index partRTindex = 0; RangeTblEntry *partrte = NULL; AppendRelInfo *appinfo = NULL; PlanRowMark *rootrc = NULL; + int lockmode; /* Locate the root partitioned table and fetch its PlanRowMark, if any. */ if (root->rowMarks) { - Index rootRTindex = 0; - /* * The root partitioned table itself might be a child of UNION ALL * parent, so we must resort to finding the root parent like this. @@ -1861,13 +1861,16 @@ build_partition_rel(PlannerInfo *root, RelOptInfo *parent, Oid partoid) rootrc = get_plan_rowmark(root->rowMarks, rootRTindex); } - /* - * expand_inherited_rtentry alreay locked all partitions, so pass - * NoLock for lockmode. - */ + /* Determine the correct lockmode to use. */ + if (rootRTindex == root->parse->resultRelation) + lockmode = RowExclusiveLock; + else if (rootrc && RowMarkRequiresRowShareLock(rootrc->markType)) + lockmode = RowShareLock; + else + lockmode = AccessShareLock; add_inheritance_child_to_query(root, parentrte, parent->relid, parent->reltype, parent->tupdesc, - rootrc, partoid, NoLock, + rootrc, partoid, lockmode, &appinfo, &partrte, &partRTindex); /* Partition turned out to be a partitioned table with 0 partitions. */ -- 2.11.0