On 9/23/25 21:46, Tom Lane wrote: > [ sorry for ridiculously slow response to this ] > > Tomas Vondra <[email protected]> writes: >> Here's a patch trying to do it more like this - by manipulating the >> lists describing the join problems, before it's passed the the actual >> join search algorithm (which is where the PoC patch did this). >> I wonder if this is roughly the place where you imagined this would be >> done, or if you envision some other issue with this approach. > > Cool. This is proof-of-concept that manipulating the joinlist can > do what we need done. So we can move on to what heuristics we need > to use. >
Thanks. Good to hear this seems like a reasonable place. >> I initially tried to manipulate the joinlist much earlier - pretty much >> right at the end of deconstruct_jointree. But that turned out to be way >> too early. To identify dimensions etc. we need to check stuff about >> foreign keys, join clauses, ... and that's not available that early. > >> So I think this needs to happen much later in query_planner(), and the >> patch does it right before the make_one_rel() call. Maybe that's too >> late, but it needs to happen after match_foreign_keys_to_quals(), as it >> relies on some of the FK info built by that call. Maybe we could call >> match_foreign_keys_to_quals() earlier, but I don't quite see any >> benefits of doing that ... > > I don't have a problem with doing it where you did it, but the comment > should explain the placement. What you do have in the comment mostly > belongs with the code, too; it's not the business of the caller. So > in planmain.c something like > > + /* > + * Try to simplify the join search problem for starjoin-like joins. > + * This step relies on info about FK relationships, so we can't do it > + * till after match_foreign_keys_to_quals(). > + */ > > would be more appropriate IMO. I agree. I've adopted your wording, and moved the original comment to starjoin_adjust_joins (with some changes). > I'd be slightly inclined to put the GUC test there, too: > > + if (enable_starjoin_join_search) > + joinlist = starjoin_adjust_joins(root, joinlist); > I'm not sure I like this very mcuh. No other call in query_planner() does it like that. Most don't have such GUC, but at least remove_useless_self_joins does, and it still doesn't check it here. > > I agree that you need to worry about join order restrictions, > and that it's not immediately clear how to do that. join_is_legal > would work if we could call it, but the problem is that at this > stage we'll only have RelOptInfos for base rels not join rels. > If we have a joinlist (A B C D) and we are considering treating > C as a dimension table, then the questions we have to ask are: > (a) is it okay to build the (A B D) join without C? > (b) is it okay to join (A B D) to C? > > In this simple case, I think (b) must be true if (a) is, but > I'm not quite sure that that's so in more complex cases with > multiple candidates for dimension tables. In any case, > join_is_legal won't help us if we don't have join RelOptInfos. > > I'm inclined to start by using has_join_restriction: if that > says "false" for a candidate dimension table, it should be safe > to postpone the join to the dimension table. We might be able > to refine that later. > Thanks. I agree has_join_restriction seems like a good start, I'll give that a try in the next patch version. When thinking about this, I realized the has_join_restriction() is only ever used in join_search_one_level(), i.e. when dealing with each small join order problem. Doesn't this mean the deconstructed jointree must already consider the restrictions in some way? I don't see any explicit mentions of such join order restrictions in deconstruct_recurse. It must not violate any ordering restrictions by splitting the joins in a "wrong" way, right? If I set join_collapse_limit=1 it still needs to satisfy all the rules. I was wondering if maybe we could piggy-back on that, somehow. But maybe that's not very practical, and has_join_restriction() is the way to go. It's been a while since I looked at this patch, so it's possible I already concluded that wouldn't work, and forgot about it :-/ >> The second example (create-2/select-2) is quite different, in that it's >> nor a starjoin schema. It still joins one "main" table with multiple >> "dimensions", but the FKs go in the other direction (to a single column >> in the main table). But it has a very similar bottleneck - the order of >> the joins is expensive, but it often does not matter very much, because >> the query matches just one row anyway. And even if it returns more rows, >> isn't the join order determined just by the selectivity of each join? >> Maybe the starjoin optimization could be made to work for this type too? > > Yeah, I'm slightly itchy about relying on FKs in this heuristic at > all; it doesn't seem like quite the right thing. I think we do want > one side of the join to be joining to a PK or at least a unique index, > but I'm not sure we need to insist on there being an FK relationship. > True, requiring the FK may be unnecessary in this case. We do need to guarantee the cardinality does not change, but a UNIQUE + LEFT JOIN seems to be enough for that. BTW the v3 lost the patch/ directory. I assume that wasn't intentional, so I added it back into v4. > A couple of minor coding notes: > > * There's no point in doing anything (except recursing) if the joinlist > contains fewer than 3 items, and maybe as a further heuristic > this shouldn't kick in till later yet, like 5 or so items. > When there are just a few items, the possibility of missing the > best plan seems to outweigh the minimal savings in plan time. > > * Joinlists never contain anything but RangeTblRefs and sub-lists. > See make_rel_from_joinlist. > > * Your reconstructed joinlist is overly complicated. Instead of > > + newlist = list_make2(newlist, list_make1(lfirst(lc))); > > you could just do > > + newlist = list_make2(newlist, lfirst(lc)); > > because a single-element subproblem is useless. > > I notice that the patch doesn't apply cleanly anymore because of > the introduction of guc_parameters.dat. So here's a v3 that > rebases over that, and I took the liberty of fixing the joinlist > construction as above, but I didn't do anything else. > Thanks. I agree with all of these comments, and updated v4 accordingly. cfbot started complaining about guc_parameters.dat and a couple more things, so v4 fixes that too. regards -- Tomas Vondra
From b8473cd5c93bf594896d2be40a7f093e90aeca16 Mon Sep 17 00:00:00 2001 From: Tomas Vondra <[email protected]> Date: Sat, 8 Nov 2025 18:16:30 +0100 Subject: [PATCH v4] Simplify planning of starjoin queries --- patch/create-1.sql | 18 + patch/create-2.sql | 11 + patch/select-1.sql | 9 + patch/select-2.sql | 9 + src/backend/optimizer/plan/analyzejoins.c | 443 ++++++++++++++++++ src/backend/optimizer/plan/planmain.c | 7 + src/backend/utils/misc/guc_parameters.dat | 7 + src/backend/utils/misc/postgresql.conf.sample | 1 + src/include/optimizer/planmain.h | 2 + src/test/regress/expected/sysviews.out | 3 +- 10 files changed, 509 insertions(+), 1 deletion(-) create mode 100644 patch/create-1.sql create mode 100644 patch/create-2.sql create mode 100644 patch/select-1.sql create mode 100644 patch/select-2.sql diff --git a/patch/create-1.sql b/patch/create-1.sql new file mode 100644 index 00000000000..8df04fd0677 --- /dev/null +++ b/patch/create-1.sql @@ -0,0 +1,18 @@ +create table dim1 (id int primary key, val1 text); +create table dim2 (id int primary key, val2 text); +create table dim3 (id int primary key, val3 text); +create table dim4 (id int primary key, val4 text); +create table dim5 (id int primary key, val5 text); +create table dim6 (id int primary key, val6 text); +create table dim7 (id int primary key, val7 text); + +create table t (id serial primary key, + id1 int references dim1(id), + id2 int references dim2(id), + id3 int references dim3(id), + id4 int references dim4(id), + id5 int references dim5(id), + id6 int references dim6(id), + id7 int references dim7(id)); + +vacuum analyze; diff --git a/patch/create-2.sql b/patch/create-2.sql new file mode 100644 index 00000000000..cdf612dde8f --- /dev/null +++ b/patch/create-2.sql @@ -0,0 +1,11 @@ +create table t (id serial primary key, a text); + +create table dim1 (id1 int primary key references t(id), val1 text); +create table dim2 (id2 int primary key references t(id), val2 text); +create table dim3 (id3 int primary key references t(id), val3 text); +create table dim4 (id4 int primary key references t(id), val4 text); +create table dim5 (id5 int primary key references t(id), val5 text); +create table dim6 (id6 int primary key references t(id), val6 text); +create table dim7 (id7 int primary key references t(id), val7 text); + +vacuum analyze; diff --git a/patch/select-1.sql b/patch/select-1.sql new file mode 100644 index 00000000000..1535ddcdc8f --- /dev/null +++ b/patch/select-1.sql @@ -0,0 +1,9 @@ +--set join_collapse_limit = 1; +select * from t + join dim1 on (dim1.id = id1) + join dim2 on (dim2.id = id2) + join dim3 on (dim3.id = id3) + join dim4 on (dim4.id = id4) + join dim5 on (dim5.id = id5) + join dim6 on (dim6.id = id6) + join dim7 on (dim7.id = id7); diff --git a/patch/select-2.sql b/patch/select-2.sql new file mode 100644 index 00000000000..4e1d2a7b0e7 --- /dev/null +++ b/patch/select-2.sql @@ -0,0 +1,9 @@ +-- set join_collapse_limit = 1; +select * from t + left join dim1 on (id = id1) + left join dim2 on (id = id2) + left join dim3 on (id = id3) + left join dim4 on (id = id4) + left join dim5 on (id = id5) + left join dim6 on (id = id6) + left join dim7 on (id = id7); diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c index e592e1ac3d1..66b1c8c540c 100644 --- a/src/backend/optimizer/plan/analyzejoins.c +++ b/src/backend/optimizer/plan/analyzejoins.c @@ -51,6 +51,7 @@ typedef struct } SelfJoinCandidate; bool enable_self_join_elimination; +bool enable_starjoin_join_search; /* local functions */ static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo); @@ -2511,3 +2512,445 @@ remove_useless_self_joins(PlannerInfo *root, List *joinlist) return joinlist; } + +/* + * starjoin_match_to_foreign_key + * Try to match a join to a FK constraint. + * + * For a relation to be a dimension (for the starjoin heuristics), it needs + * to be joined through a FK constraint. The dimension is expected to be + * on the PK side of the join. The relation must not have any additional + * join clauses, beyond those matching the foreign key. + * + * We already have a list of relevant foreign keys, and we use that info + * for selectivity estimation in get_foreign_key_join_selectivity(). And + * we're actually doing something quite similar here. + * + * XXX Do we need to worry about the join type, e.g. inner/outer joins, + * semi/anti? get_foreign_key_join_selectivity() does care about it, and + * ignores some of those cases. Maybe we should too? + * + * XXX Check there are no other join clauses, beyond those matching the + * foreign key. But do we already have the joininfo at this point? Some + * of this stuff gets build only during the join order search later. + * The match_foreign_keys_to_quals() probably needs to be aware of all + * this, so how does it do that? + */ +static bool +starjoin_match_to_foreign_key(PlannerInfo *root, RelOptInfo *rel) +{ + ListCell *lc; + + /* Consider each FK constraint that is known to match the query */ + foreach(lc, root->fkey_list) + { + ForeignKeyOptInfo *fkinfo = (ForeignKeyOptInfo *) lfirst(lc); + int nmatches = 0; + + /* rel is not the referenced table of the FK */ + if (fkinfo->ref_relid != rel->relid) + continue; + + /* + * Do we have a match for each key of the FK? + * + * XXX get_foreign_key_join_selectivity checks EquivalenceClasses, + * we should probably (definitely) do that here too. + * + * XXX We should check that all the clauses have the same relation + * on the other side (for multi-column keys). And that there are + * no other join clauses other than those matching the FK. + * + * XXX Do we need to check that the FK side of the join (i.e. the fact + * table) has the columns referenced as NOT NULL? Otherwise we could + * have a FK join that reduces the cardinality, which is one of + * the arguments why it's fine to move the join (that it doesn't + * change the cardinality). But if the join is LEFT JOIN, this + * should be fine too - but do we get here with LEFT JOINs? + * + * XXX Do we need to check if the other side of the FK is in the + * current join list? Maybe it's in some later one? + */ + for (int i = 0; i < fkinfo->nkeys; i++) + { + bool has_matching_clause = false; + + /* + * Is there a clause matching this FK key? + * + * XXX We need to make sure it's a valid match, e.g. that the + * same referencing table matches all keys in a composite FK, + * and so on. + * + * XXX Do we need to check some relationship to other rels in + * the same jointree item? E.g. the referencing table should + * not be a dimensions we already removed. + */ + if ((fkinfo->rinfos[i] != NULL) || (fkinfo->eclass[i] != NULL)) + { + has_matching_clause = true; + nmatches++; + continue; + } + + /* found a FK key without a matching join clause, ignore the FK */ + if (has_matching_clause) + break; + } + + /* matched all FK keys */ + if (nmatches == fkinfo->nkeys) + { + return true; + } + } + + return false; +} + + +/* + * starjoin_is_dimension + * Determine if a range table entry is a dimension in a starjoin. + * + * To be considered a dimension for the simplified join order search, the + * join must not affect the cardinality of the join. We ensure that by + * requiring a couple things: + * + * 1) the join clause has to match a FK (that is, the fact does have at + * most one matching row in the dimension) + * + * 2) the FK side (the fact table) should be marked as NOT NULL, so that + * we know there's exactly one dimension row for each fact table row + * + * 3) there must be no additional restrictions on the relation (that + * might eliminate some of the rows, reducing the cardinality) + * + * XXX The Implementation is incomplete. It probably needs more thought, + * considering some join types would allow relaxing some of the checks + * (e.g. outer joins may not require checking (2) or possibly even (3), + * depending on where the condition is, what varnullingrels it has). + * + * XXX I wonder if we could handle (3) by ordering the dimensions by the + * selectivity of the restriction. There are no join clauses between the + * dimensions (ignoring the snowflake joins, but even there the clauses + * don't go between branches), so the selectivity could be treated as + * a measure of how much it shrinks the join result. So we could just + * sort the dimensions by this, starting with the lowest selectivity + * (close to 0.0), and ending with dimensions without restrictions (in + * which case the selectivity is 1.0). + * + * XXX If the join in INNER, and the fact side has NULL values in the + * join key, we might consider nullfrac as restriction. + * + * XXX I'm not sure how careful this needs to be about join order + * restrictions. Maybe it should call have_relevant_joinclause and + * have_join_order_restriction, to ensure the join order is OK? + * + * The optimizer/README is not very clear about this, but maybe it's + * a too specific question. It seems to say the relations in those + * lists can be joined in any order (lines 94 and 106). Maybe that's + * not what it means, or I'm misunderstanding it. + * + * It however seems has_join_restrictions() in join_search_one_level() + * forces the code to look only at "earlier" rels in the list + * + * first_rel = foreach_current_index(r) + 1 + * + * So maybe we just need to stop once we find a rel with a restriction, + * as determined byhas_join_restrictions()? + * + * But there's also join_is_legal() to check legality of joins, with + * LEFT/RIGHT joins, and IN/EXISTS clauses. See README line 188. And it + * also looks-up the SpecialJoinInfo for the join. So maybe we should + * lookup RelOptInfo for both sides of the join, and call join_is_legal + * on that? Might be too expensive, though. Maybe do that only when + * has_join_restrictions already says yes? + * + * Maybe we should use has_join_restrictions(), but in a different way. + * We could still treat rels with restrictions as dimensions, and move + * that to the separate list (that doesn't change the join order), but + * stop once we hit the first non-dimension with a restriction? Because + * if any relation after that was a dimention, we wouldn't be able to + * move it to the separate list. It'd change the join order in a way + * that might violate the restriction. I believe that's the idea behind + * first_rel in join_search_one_level(), but maybe not. + * + * Perhaps have_join_order_restriction and have_relevant_joinclause are + * useful for this, rather than has_join_restrictions? We might look at + * actual pairs of relations, and/or check there's no join order + * restriction with respect to the relations we skipped/moved to the + * list of dimension? + * + * AFAICS it's just the skipping that can break the order restrictions? + * Adding something to the list of dimensions keeps the order (at least + * with respect to the rels after it). + */ +static bool +starjoin_is_dimension(PlannerInfo *root, RangeTblRef *rtr) +{ + Index rti = rtr->rtindex; + RangeTblEntry *rte = root->simple_rte_array[rti]; + RelOptInfo *rel = root->simple_rel_array[rti]; + + /* only plain relations can be dimensions (we need FK/PK join) */ + if ((rte->rtekind != RTE_RELATION) || + (rel->reloptkind != RELOPT_BASEREL)) + return false; + + /* + * Does it have any conditions/restrictions that may affect the number + * of rows matched? If yes, don't treat as dimension. + * + * Dimensions in a starjoin may have restrictions, but that means it'll + * change cardinality of the joins (reduce it), so it may be better to + * join it early. We leave it to the regular join order planning. The + * expectation is that most dimensions won't have extra restrictions. + * + * XXX Should we check some other fields, like lateral references, and + * so on? Or is that unnecessary? What about eclasses? + */ + if (rel->baserestrictinfo != NIL) + return false; + + /* + * See if the join clause matches a foreign key. It should match a + * single relation on the other side, and the dimension should be on + * the PK side. + * + * XXX loosely inspired by get_foreign_key_join_selectivity() + */ + if (!starjoin_match_to_foreign_key(root, rel)) + return false; + + /* + * XXX Maybe some additional checks here ... + */ + + return true; +} + +/* + * starjoin_adjust_joins + * Adjust the jointree for starjoins, to simplify the join order search. + * + * Try to simplify the join search problem for starjoin-like joins, with + * joins over FK relationships. The dimensions can be joined in almost + * any order, which is about the worst case for the standard join order + * search, and can be close to factorial complexity. But there's not much + * difference between such join orders, so we just leave the dimensions at + * the end of each group (as determined by the join_collapse_limit earlier). + * + * The join search for starjoin queries is surprisingly expensive, because + * there are very few join order restrictions. Consider a starjoin query + * + * SELECT * FROM f + * JOIN d1 ON (f.id1 = d1.id) + * JOIN d2 ON (f.id2 = d2.id) + * ... + * JOIN d9 ON (f.id9 = d9.id) + * + * There are no clauses between the dimension tables (d#), which means those + * tables can be joined in almost arbitrary order. This means the standard + * join_order_search() would explore a N! possible join orders. It is not + * that bad in practice, because we split the problem by from_collapse_limit + * into a sequence of smaller problems, but even for the default limit of + * 8 relations it's quite expensive. This can be easily demonstrated by + * setting from_collapse_limit=1 for example starjoin queries. + * + * The idea here is to apply a much simpler join order search for this type + * of queries, without too much risk of picking a much worse plans. It is + * however a trade off between how expensive we allow this to be, and how + * good the decisions will be. This can help only starjoins with multiple + * dimension tables, and we don't want to harm planning of other queries, + * so the basic "query shape" detection needs to be very cheap. And then + * it needs to be cheaper than the regular join order search. + * + * If a perfect check is impossible or too expensive, it's better to end + * up with a cheap false negative (i.e. and not use the optimization), + * rather than risk regressions in other cases. + * + * The simplified join order search relies on the fact that if the joins + * to dimensions do not alter the cardinality of the join relation, then + * the relative order of those joins does not matter. All the possible + * orders are guaranteed to perform the same. So we can simply pick one + * of those orders, and "hardcode" it in the join tree later passed to the + * join_order_search(). + * + * The query may involve joins to additional (non-dimension) tables, and + * those may alter cardinality. Some joins may increase it, other joins + * may decrease it. In principle, it'd be best to first perform all the + * joins that reduce join size, then join all the dimensions, and finally + * perform joins that may increase the join size. But this is not done + * now, currently we simply apply all the dimensions at the end, hoping + * that the earlier joins did not inflate the join too much. + * + * The definition of a dimension is a bit vague. For our definition see + * the comment at starjoin_is_dimension(). + * + * The optimization works by manipulating the joinlist (originally built + * by deconstruct_jointree), which decomposed the original jointree into + * smaller "problems" depending on join type and join_collapse_limit. We + * inspect those smaller lists, and selectively split them into smaller + * problems to force a join order. This may effectively undo some of the + * merging done by deconstruct_jointree(), which tries to build problems + * with up to join_collapse_limit relations. + * + * For example, imagine a join problem with 8 rels - one fact table and + * then 7 dimensions, which we can represent a joinlist with 8 elements. + * + * (D7, D6, D5, D4, D3, D2, D1, F) + * + * Assuming all those joins meet the requirements (have a matching FK, + * don't affect the join cardinality, ...), then we can split this into + * + * (D7, (D6, (D5, (D4, (D3, (D2, (D1, F))))))) + * + * which is a nested joinlist, with only two elements on each level. That + * means there's no need for expensive join order search, there's only + * one way to join the relations (two, if we consider the relations may + * switch sides). + * + * The joinlist may already be nested, with multiple smaller subproblems. + * We look at each individual join problem independently, i.e. we don't + * try to merge problems to build join_collapse_limit problems again. + * This is partially to keep it cheap/simple, but also so not change + * behavior for cases when people use join_collapse_limit to force some + * particular join shape. + * + * XXX A possible improvement is to allow handling snowflake joins, i.e. + * recursive dimensions. That would require a somewhat more complicated + * processing, because a dimension would be allowed other rels, as long + * as those are dimensions too. And we'd need to be more careful about + * the order in which join them to the top of the join. + * + * XXX One possible risk is that moving the dimension joins at the very + * end may move that after joins that increase the cardinality. Which + * may cause a regression. Such joins however don't seem very common, at + * least in regular starjoin queries. So maybe we could simply check if + * there are any such joins and disable this optimization. Is there a + * cheap way to identify that a join increases cardinality? + * + * XXX Ideally, we'd perform the dimension joins at the place with the + * lowest cardinality. Imagine a joinlist + * + * (D1, D2, A, B, F) + * + * Where A increases join cardinality, while B does not (possibly even + * reduces it). Ideally, we'd do the join like this + * + * (A, (D2, (D1, (B, F)))) + * + * so D1/D2 get joined at the point of "lowest cardinality". We probably + * don't want to do all this cardinality estimation work here, it'd copy + * what we already do in the join_order_search(). Perhaps we could invent + * a "join item" representing a join to all those dimensions, and pass it + * to join_order_search()? And let it pick the right place for it? It'd + * always join them in the same order, it'd not reorder them. It would + * still do the regular cardinality estimations etc. It would be trivial + * to disable the optimization if needed - don't collapse the dimensions + * into the new type of join item. + */ +List * +starjoin_adjust_joins(PlannerInfo *root, List *joinlist) +{ + ListCell *lc; + List *newlist = NIL; + List *dimensions = NIL; + int nlist = list_length(joinlist); + + /* Do nothing if starjoin optimization not enabled. */ + if (!enable_starjoin_join_search) + return joinlist; + + /* + * Do nothing if starjoin optimization not applicable. + * + * XXX It might seems we can skip this for lists with <= 2 items, but + * that does not work, the elements may be nested list, and we need to + * descend into those. So do what remove_useless_self_joins() does, and + * only bail out for the simplest single-relation case (i.e. no joins). + */ + if (joinlist == NIL || + (nlist == 1 && !IsA(linitial(joinlist), List))) + return joinlist; + + /* + * Process the current join problem - split the elements into dimensions + * and non-dimensions. If there are dimensions, add them back at the end, + * as small single-rel joins. + * + * The list may contain various types of elements. It may contain a list, + * which means it's an independent join search problem and we need to + * process it recursively. Or it may be RangeTblRef, in which case we + * need to check if it's a dimension. Other types of elements are just + * added back to the list as-is. + * + * XXX I think we need to be careful to keep the order of the list (for + * the non-dimension entries). The join_search_one_level() relies on + * that when handling join order restrictions. + * + * XXX It might be better to only create a new list if needed, i.e. once + * we find the first dimension. So that non-starjoin queries don't pay + * for something they don't need. A mutable iterator might be a way, but + * I'm not sure how expensive this really is. + */ + foreach (lc, joinlist) + { + Node *item = (Node *) lfirst(lc); + + /* a separate join search problem, handle it recursively */ + if (IsA(item, List)) + { + newlist = lappend(newlist, + starjoin_adjust_joins(root, (List *) item)); + continue; + } + + /* + * If it's not a List, it has to be a RangeTblRef - jinlists can't + * contain any other elements (see make_rel_from_joinlist). + */ + Assert(IsA(item, RangeTblRef)); + + /* + * An entry representing a baserel. If it's a dimension, save it in + * a separate list, and we'll add it at the "top" of the join at the + * end. Otherwise add it to the list just like other elements. + * + * We do this only when the joinlist has at least 3 items. For fewer + * rels the optimization does not matter, there's only a single join + * order anyway. That only skips the optimization on this level - we + * still do the recursion, and that might hit a larger join problem. + * + * XXX We might have a new GUC to customize the cutoff limit, but for + * now it seems good enough to do it whenever applicable. If we find + * it's not worth it for less than N rels, we can add it later. + */ + if ((nlist >= 3) && + starjoin_is_dimension(root, (RangeTblRef *) item)) + { + dimensions = lappend(dimensions, item); + continue; + } + + /* not a dimension, add it to the list directly */ + newlist = lappend(newlist, item); + } + + /* + * If we found some dimensions, add them to the join tree one by one. + * The exact order does not matter, so we add them in the order we + * found them in the original list. + * + * We need to add them by creating smaller 2-element lists, with the rest + * of the list being a sub-problem and then adding the dimension + * table. This is how we force the desired join order. + */ + foreach (lc, dimensions) + { + newlist = list_make2(newlist, lfirst(lc)); + } + + return newlist; +} diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c index eefc486a566..14abbbbd361 100644 --- a/src/backend/optimizer/plan/planmain.c +++ b/src/backend/optimizer/plan/planmain.c @@ -291,6 +291,13 @@ query_planner(PlannerInfo *root, */ distribute_row_identity_vars(root); + /* + * Try to simplify the join search problem for starjoin-like joins. + * This step relies on info about FK relationships, so we can't do it + * till after match_foreign_keys_to_quals(). + */ + joinlist = starjoin_adjust_joins(root, joinlist); + /* * Ready to do the primary planning. */ diff --git a/src/backend/utils/misc/guc_parameters.dat b/src/backend/utils/misc/guc_parameters.dat index 1128167c025..510241e75be 100644 --- a/src/backend/utils/misc/guc_parameters.dat +++ b/src/backend/utils/misc/guc_parameters.dat @@ -968,6 +968,13 @@ boot_val => 'true', }, +{ name => 'enable_starjoin_join_search', type => 'bool', context => 'PGC_USERSET', group => 'QUERY_TUNING_METHOD', + short_desc => 'Enables simplified join order planning for starjoins.', + flags => 'GUC_EXPLAIN', + variable => 'enable_starjoin_join_search', + boot_val => 'false', +}, + { name => 'enable_tidscan', type => 'bool', context => 'PGC_USERSET', group => 'QUERY_TUNING_METHOD', short_desc => 'Enables the planner\'s use of TID scan plans.', flags => 'GUC_EXPLAIN', diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample index f62b61967ef..4e5542f690c 100644 --- a/src/backend/utils/misc/postgresql.conf.sample +++ b/src/backend/utils/misc/postgresql.conf.sample @@ -424,6 +424,7 @@ #enable_presorted_aggregate = on #enable_seqscan = on #enable_sort = on +enable_starjoin_join_search = off #enable_tidscan = on #enable_group_by_reordering = on #enable_distinct_reordering = on diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h index 00addf15992..c30ac3a5754 100644 --- a/src/include/optimizer/planmain.h +++ b/src/include/optimizer/planmain.h @@ -21,6 +21,7 @@ #define DEFAULT_CURSOR_TUPLE_FRACTION 0.1 extern PGDLLIMPORT double cursor_tuple_fraction; extern PGDLLIMPORT bool enable_self_join_elimination; +extern PGDLLIMPORT bool enable_starjoin_join_search; /* query_planner callback to compute query_pathkeys */ typedef void (*query_pathkeys_callback) (PlannerInfo *root, void *extra); @@ -120,6 +121,7 @@ extern bool innerrel_is_unique_ext(PlannerInfo *root, Relids joinrelids, JoinType jointype, List *restrictlist, bool force_cache, List **extra_clauses); extern List *remove_useless_self_joins(PlannerInfo *root, List *joinlist); +extern List *starjoin_adjust_joins(PlannerInfo *root, List *joinlist); /* * prototypes for plan/setrefs.c diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out index 3b37fafa65b..0c3b1eeaba7 100644 --- a/src/test/regress/expected/sysviews.out +++ b/src/test/regress/expected/sysviews.out @@ -172,8 +172,9 @@ select name, setting from pg_settings where name like 'enable%'; enable_self_join_elimination | on enable_seqscan | on enable_sort | on + enable_starjoin_join_search | off enable_tidscan | on -(25 rows) +(26 rows) -- There are always wait event descriptions for various types. InjectionPoint -- may be present or absent, depending on history since last postmaster start. -- 2.51.1
