Hi, thanks for the new patch. I made an additional shrink from your last one. Do you have a look on the attached?
At Sun, 22 Mar 2015 19:42:21 +1300, David Rowley <dgrowle...@gmail.com> wrote in <caaphdvrkwmmtwkxfn4uazyza9jql1c7uwbjbtuwfr69rqlv...@mail.gmail.com> > On 20 March 2015 at 21:11, David Rowley <dgrowle...@gmail.com> wrote: > > > > I can continue working on your patch if you like? Or are you planning to > > go further with it? It's fine that you continue to work on this. # Sorry for the hardly baked patch which had left many things alone:( > I've been working on this more over the weekend and I've re-factored things > to allow LEFT JOINs to be properly marked as unique. > I've also made changes to re-add support for detecting the uniqueness of > sub-queries. I don't see the point of calling mark_unique_joins for every iteration on join_info_list in remove_useless_joins. The loop already iteraltes on whole the join_info_list so mark_unique_join as an individual function is needless. Finally, simply marking uniqueness of join in join_is_removable seems to be enough, inhibiting early bailing out by the checks on attribute usage and placeholder let it work as expected. Reducing changes to this extent, I can easily see what is added to planning computations. It consists of mainly two factors. - Unique-join chekcs for every candidate inner joins in add_paths_to_joinrel. - Uniqueness check of mergejoinable clause in join-removability check for every left join, some of which would be skipped by other checks before. > Also, I've added modified the costing for hash and nested loop joins to > reduce the cost for unique inner joins to cost the join the same as it does > for SEMI joins. This has tipped the scales on a few plans in the regression > tests. I've forgotten it, but quite important. > Also, please see attached unijoin_analysis.patch. This just adds some code > which spouts out notices when join nodes are initialised which states if > the join is unique or not. Running the regression tests with this patch in > places gives: > > Unique Inner: Yes == 753 hits > Unique Inner: No == 1430 hits > > So it seems we can increase the speed of about 1 third of joins by about > 10%. > A quick scan of the "No"s seems to show quite a few cases which do not look > that real world like. e.g cartesian join. I don't have an idea how many queries in the reality hit this but I suppose it's not a few. > It would be great if someone could run some PostgreSQL application with > these 2 patches applied, and then grep the logs for the Unique Inner > results... Just to get a better idea of how many joins in a real world case > will benefit from this patch. Wow. I think the second patch should be DEBUGx, not NOTICE:) regards, -- Kyotaro Horiguchi NTT Open Source Software Center
>From 659ae5c80ea9e6dc9f6f0a16a43db66b7329a2be Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horiguchi.kyot...@lab.ntt.co.jp> Date: Tue, 24 Mar 2015 20:59:48 +0900 Subject: [PATCH] Removing individual phase to mark unique joins. mark_unique_joins() makes redundant loops with remove_useless_joins() over join_info_list. So removed the function and moved the core steps into join_is_removable(). --- src/backend/optimizer/path/joinpath.c | 3 +- src/backend/optimizer/plan/analyzejoins.c | 247 +++++++++++------------------- src/backend/optimizer/plan/planmain.c | 3 - src/include/nodes/relation.h | 1 + src/include/optimizer/planmain.h | 1 - 5 files changed, 96 insertions(+), 159 deletions(-) diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c index b0bc3f6..909b585 100644 --- a/src/backend/optimizer/path/joinpath.c +++ b/src/backend/optimizer/path/joinpath.c @@ -296,7 +296,8 @@ add_paths_to_joinrel(PlannerInfo *root, } /* - * left joins were already checked for uniqueness in analyzejoins.c + * left joins might be already checked for uniqueness in + * remove_useless_joins() */ if (sjinfo->is_unique_join) diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c index 5d0cd2e..990f3b3 100644 --- a/src/backend/optimizer/plan/analyzejoins.c +++ b/src/backend/optimizer/plan/analyzejoins.c @@ -33,28 +33,12 @@ /* local functions */ static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo); -static bool specialjoin_is_unique_join(PlannerInfo *root, - SpecialJoinInfo *sjinfo); static void remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids); static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved); static Oid distinct_col_search(int colno, List *colnos, List *opids); -void -mark_unique_joins(PlannerInfo *root, List *joinlist) -{ - ListCell *lc; - - foreach(lc, root->join_info_list) - { - SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(lc); - - if (specialjoin_is_unique_join(root, sjinfo)) - sjinfo->is_unique_join = true; - } -} - /* * remove_useless_joins * Check for relations that don't actually need to be joined at all, @@ -107,12 +91,6 @@ restart: root->join_info_list = list_delete_ptr(root->join_info_list, sjinfo); /* - * We may now be able to mark some joins as unique which we could - * not do before - */ - mark_unique_joins(root, joinlist); - - /* * Restart the scan. This is necessary to ensure we find all * removable joins independently of ordering of the join_info_list * (note that removal of attr_needed bits may make a join appear @@ -173,17 +151,18 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo) { int innerrelid; RelOptInfo *innerrel; + Query *subquery = NULL; Relids joinrelids; - int attroff; + List *clause_list = NIL; ListCell *l; + int attroff; + bool removable = true; /* - * Join must not duplicate its outer side and must be a non-delaying left - * join to a single baserel, else we aren't going to be able to do anything - * with it. + * Must be a non-delaying left join to a single baserel, else we aren't + * going to be able to do anything with it. */ - if (!sjinfo->is_unique_join || - sjinfo->jointype != JOIN_LEFT || + if (sjinfo->jointype != JOIN_LEFT || sjinfo->delay_upper_joins) return false; @@ -192,98 +171,38 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo) innerrel = find_base_rel(root, innerrelid); - Assert(innerrel->reloptkind == RELOPT_BASEREL); - - /* Compute the relid set for the join we are considering */ - joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand); + if (innerrel->reloptkind != RELOPT_BASEREL) + return false; /* - * We can't remove the join if any inner-rel attributes are used above the - * join. - * - * Note that this test only detects use of inner-rel attributes in higher - * join conditions and the target list. There might be such attributes in - * pushed-down conditions at this join, too, but in this case the join - * would not have been marked as unique. - * - * As a micro-optimization, it seems better to start with max_attr and - * count down rather than starting with min_attr and counting up, on the - * theory that the system attributes are somewhat less likely to be wanted - * and should be tested last. + * Before we go to the effort of checking whether any innerrel variables + * are needed above the join, make a quick check to eliminate cases in + * which we will surely be unable to prove uniqueness of the innerrel. */ - for (attroff = innerrel->max_attr - innerrel->min_attr; - attroff >= 0; - attroff--) + if (innerrel->rtekind == RTE_RELATION) { - if (!bms_is_subset(innerrel->attr_needed[attroff], joinrelids)) + /* + * For a plain-relation innerrel, we only know how to prove uniqueness + * by reference to unique indexes. If there are no indexes then + * there's certainly no unique indexes so there's no point in going + * further. + */ + if (innerrel->indexlist == NIL) return false; } - - /* - * Similarly check that the inner rel isn't needed by any PlaceHolderVars - * that will be used above the join. We only need to fail if such a PHV - * actually references some inner-rel attributes; but the correct check - * for that is relatively expensive, so we first check against ph_eval_at, - * which must mention the inner rel if the PHV uses any inner-rel attrs as - * non-lateral references. Note that if the PHV's syntactic scope is just - * the inner rel, we can't drop the rel even if the PHV is variable-free. - */ - foreach(l, root->placeholder_list) + else if (innerrel->rtekind == RTE_SUBQUERY) { - PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(l); + subquery = root->simple_rte_array[innerrelid]->subquery; - if (bms_is_subset(phinfo->ph_needed, joinrelids)) - continue; /* PHV is not used above the join */ - if (bms_overlap(phinfo->ph_lateral, innerrel->relids)) - return false; /* it references innerrel laterally */ - if (!bms_overlap(phinfo->ph_eval_at, innerrel->relids)) - continue; /* it definitely doesn't reference innerrel */ - if (bms_is_subset(phinfo->ph_eval_at, innerrel->relids)) - return false; /* there isn't any other place to eval PHV */ - if (bms_overlap(pull_varnos((Node *) phinfo->ph_var->phexpr), - innerrel->relids)) - return false; /* it does reference innerrel */ + /* + * If the subquery has no qualities that support distinctness proofs + * then there's no point in going further. + */ + if (!query_supports_distinctness(subquery)) + return false; } - - return true; -} - -/* - * specialjoin_is_unique_join - * True if it can be proved that this special join will never produce - * more than 1 row per outer row, otherwise returns false if there is - * insufficient evidence to prove the join is unique. - */ -static bool -specialjoin_is_unique_join(PlannerInfo *root, SpecialJoinInfo *sjinfo) -{ - int innerrelid; - RelOptInfo *innerrel; - Query *subquery = NULL; - Relids joinrelids; - ListCell *l; - List *clause_list = NIL; - - /* check if we've already marked this join as unique on a previous call */ - if (sjinfo->is_unique_join) - return true; - - /* if there's more than 1 relation involved then punt */ - if (!bms_get_singleton_member(sjinfo->min_righthand, &innerrelid)) - return false; - - innerrel = find_base_rel(root, innerrelid); - - if (innerrel->reloptkind != RELOPT_BASEREL) - return false; - - /* - * Before we go to the effort of pulling out the join condition's columns, - * make a quick check to eliminate cases in which we will surely be unable - * to prove uniqueness of the innerrel. - */ - if (!rel_supports_distinctness(root, innerrel)) - return false; + else + return false; /* unsupported rtekind */ /* Compute the relid set for the join we are considering */ joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand); @@ -293,7 +212,9 @@ specialjoin_is_unique_join(PlannerInfo *root, SpecialJoinInfo *sjinfo) * either the outer rel or a pseudoconstant. If an operator is * mergejoinable then it behaves like equality for some btree opclass, so * it's what we want. The mergejoinability test also eliminates clauses - * containing volatile functions, which we couldn't depend on. + * containing volatile functions, which we couldn't depend on. The + * uniquness might differ from the previous call on the same innerrel so + * always check it regardless of the previous result. */ foreach(l, innerrel->joininfo) { @@ -309,8 +230,10 @@ specialjoin_is_unique_join(PlannerInfo *root, SpecialJoinInfo *sjinfo) !bms_equal(restrictinfo->required_relids, joinrelids)) { /* - * If such a clause actually references the inner rel then we can't - * mark the join as unique. + * If such a clause actually references the inner rel then join + * removal has to be disallowed. We have to check this despite + * the previous attr_needed checks because of the possibility of + * pushed-down clauses referencing the rel. */ if (bms_is_member(innerrelid, restrictinfo->clause_relids)) return false; @@ -333,10 +256,63 @@ specialjoin_is_unique_join(PlannerInfo *root, SpecialJoinInfo *sjinfo) clause_list = lappend(clause_list, restrictinfo); } - if (rel_is_distinct_for(root, innerrel, clause_list)) - return true; + /* + * Some day it would be nice to check for other methods of establishing + * distinctness. + */ + sjinfo->is_unique_join = rel_is_distinct_for(root, innerrel, clause_list); - return false; + if (!sjinfo->is_unique_join) + return false; + + /* + * We can't remove the join if any inner-rel attributes are used above the + * join. + * + * Note that this test only detects use of inner-rel attributes in higher + * join conditions and the target list. There might be such attributes in + * pushed-down conditions at this join, too. We check that case below. + * + * As a micro-optimization, it seems better to start with max_attr and + * count down rather than starting with min_attr and counting up, on the + * theory that the system attributes are somewhat less likely to be wanted + * and should be tested last. + */ + for (attroff = innerrel->max_attr - innerrel->min_attr; + attroff >= 0 && removable; + attroff--) + { + if (!bms_is_subset(innerrel->attr_needed[attroff], joinrelids)) + return false; + } + + /* + * Similarly check that the inner rel isn't needed by any PlaceHolderVars + * that will be used above the join. We only need to fail if such a PHV + * actually references some inner-rel attributes; but the correct check + * for that is relatively expensive, so we first check against ph_eval_at, + * which must mention the inner rel if the PHV uses any inner-rel attrs as + * non-lateral references. Note that if the PHV's syntactic scope is just + * the inner rel, we can't drop the rel even if the PHV is variable-free. + */ + foreach(l, root->placeholder_list) + { + PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(l); + + if (bms_is_subset(phinfo->ph_needed, joinrelids)) + continue; /* PHV is not used above the join */ + if (bms_overlap(phinfo->ph_lateral, innerrel->relids)) + return false; /* it references innerrel laterally */ + if (!bms_overlap(phinfo->ph_eval_at, innerrel->relids)) + continue; /* it definitely doesn't reference innerrel */ + if (bms_is_subset(phinfo->ph_eval_at, innerrel->relids)) + return false; /* there isn't any other place to eval PHV */ + if (bms_overlap(pull_varnos((Node *) phinfo->ph_var->phexpr), + innerrel->relids)) + return false; /* it does reference innerrel */ + } + + return true; } @@ -617,43 +593,6 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list) } return false; /* can't prove rel to be distinct over clause_list */ } -/* - * rel_supports_distinctness - * Returns true if rel has some properties which can prove the relation - * to be unique over some set of columns. - * - * This is effectively a pre-checking function for rel_is_distinct_for(). - * It must return TRUE if rel_is_distinct_for() could possibly return TRUE - */ -bool -rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel) -{ - if (rel->rtekind == RTE_RELATION) - { - /* - * For a plain-relation, we only know how to prove uniqueness - * by reference to unique indexes. If there are no indexes then - * there's certainly no unique indexes so there's nothing to prove - * uniqueness on the relation. - */ - if (rel->indexlist != NIL) - return true; - } - else if (rel->rtekind == RTE_SUBQUERY) - { - Query *subquery = root->simple_rte_array[rel->relid]->subquery; - - /* Check if the subquery has any qualities that support distinctness */ - if (query_supports_distinctness(subquery)) - return true; - } - - /* - * Some day it would be nice to check for other methods of establishing - * distinctness. - */ - return false; -} /* * query_supports_distinctness - could the query possibly be proven distinct diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c index 55310d8..848df97 100644 --- a/src/backend/optimizer/plan/planmain.c +++ b/src/backend/optimizer/plan/planmain.c @@ -174,9 +174,6 @@ query_planner(PlannerInfo *root, List *tlist, */ fix_placeholder_input_needed_levels(root); - /* Analyze joins to find out which ones have a unique inner side */ - mark_unique_joins(root, joinlist); - /* * Remove any useless outer joins. Ideally this would be done during * jointree preprocessing, but the necessary information isn't available diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h index 776a269..b7cecb6 100644 --- a/src/include/nodes/relation.h +++ b/src/include/nodes/relation.h @@ -1407,6 +1407,7 @@ typedef struct SpecialJoinInfo JoinType jointype; /* always INNER, LEFT, FULL, SEMI, or ANTI */ bool lhs_strict; /* joinclause is strict for some LHS rel */ bool delay_upper_joins; /* can't commute with upper RHS */ + List *mergejoinable_clauses; /* cache of mergejoinable clause */ bool is_unique_join; /* matches a max of 1 row per outer join row */ /* Remaining fields are set only for JOIN_SEMI jointype: */ bool semi_can_btree; /* true if semi_operators are all btree */ diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h index 7a85227..bd1a709 100644 --- a/src/include/optimizer/planmain.h +++ b/src/include/optimizer/planmain.h @@ -126,7 +126,6 @@ extern void mark_unique_joins(PlannerInfo *root, List *joinlist); extern List *remove_useless_joins(PlannerInfo *root, List *joinlist); extern bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list); -extern bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel); extern bool query_supports_distinctness(Query *query); extern bool query_is_distinct_for(Query *query, List *colnos, List *opids); -- 2.1.0.GIT
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers