On Mon, 21 Jan 2019 at 09:51, Kim Rose Carlsen <k...@hiper.dk> wrote: > remove_useless_join does not prove uniqueness if the unique index is partial, > and therefore wont remove the join if no columns are referenced (see example > in bottom). > > I have been trying to look around the source code and from what I have > identified the problem seems to be that "check_index_predicates(..)" happens > after "remove_useless_join(..)", and therefore cannot see that the unique > index is actually covered by the join condition.
The main reason that join removal happens so early on in planning is that we want to forego doing as much work as possible on a relation that that might get removed. > From analyzejoins.c:612, rel_supports_distinctness(..) > if (ind->unique && ind->immediate && > (ind->indpred == NIL || ind->predOK)) > return true; This is really just a precheck to see if there are any unique indexes which may serve as proof that the join does not duplicate any rows from the other side of the join. If this fails then the code only knows not to bother looking any further. If it passes then more work needs to be done to see if the relation supports distinctness. > I have tried to add check_index_predicates(..) to > rel_supports_distinctness(..) and this produces the expected plan, but I have > no idea of the implication of doing check_index_predicates(..) earlier. Looking at check_index_predicates() it makes use of root->all_baserels, which only gets set in make_one_rel() which is well after the join removal is performed. So it does look like there's a bit of a chicken and egg problem there around which relations to use in generate_join_implied_equalities(). Moving the check_index_predicates() call earlier would cause it to miss using these additional quals completely due to all_baserels being empty. I'm unsure if join removing a relation that we've found a qual during generate_join_implied_equalities() in would be very safe. I'm not that sure if it would even be possible to remove such a relation. That would require a bit of research. Maybe it might be worth thinking about making predOK have 3 possible values, with the additional one being "Unknown". We could then consider calling predicate_implied_by() in relation_has_unique_index_for() for indexes that are unique and immediate but predOK is still unknown. That might reduce the additional work to a level that might be acceptable. The extra check could either be done before or after the column matching code in relation_has_unique_index_for(). I guess either one is equally as likely to fail, but one may be cheaper than the other to perform. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services