On Mon, 24 Dec 2018 at 09:38, David Rowley <david.row...@2ndquadrant.com> wrote: > Using the above idea, but rather than going to the trouble of storing > PlannerInfo->eq_classes as an array type list, if we build the array > on the fly inside match_foreign_keys_to_quals(), then build a > Bitmapset type index to mark which of the eclasses contains members > for each relation, then I can get the run-time for the function down > to just 0.89%. Looking at other functions appearing high on the > profile I also see; have_relevant_eclass_joinclause() (14%), > generate_join_implied_equalities_for_ecs() (23%).
I've now expanded the proof of concept patch to use this indexing technique for have_relevant_eclass_joinclause() and generate_join_implied_equalities_for_ecs(). With Tom's test from up-thread, I get: Master: latency average = 14125.374 ms Patched: latency average = 2417.164 ms There are some other cases, such as generate_implied_equalities_for_column(), that are possibly also indexable, but in that case, we cannot use ec_relids to help build the index since it does not keep track of other member relation equivalence class members. That function is appearing at about 3.3% of total plan time with the patched version of the code, so there's still some small gains to be had there. The performance of has_relevant_eclass_joinclause() could likely also be improved from this indexing. According to my profiling, it's currently still about 2.6% of total planning time with the patched version. I've attached the updated (rough) proof of concept patch. I ended up stuffing the equivalence class index structure into PlannerInfo so that it would be available in all the places it was required, but build just once higher up the call stack. I don't believe this is the correct solution for a finished patch, but I didn't really have any better ideas and this seemed good enough to demonstrate what the performance could look like. Other ideas I have to further improve the performance of this query would be to move the fkey_list out of PlannerInfo and instead include a per-relation list inside RelOptInfo. This would allow get_foreign_key_join_selectivity() to just look at foreign keys that are relevant to the given relations rather than having to skip all foreign keys that are not. This function is still accounting for about 5.5% of the total planning time for this query. I imagine it wouldn't hurt match_foreign_keys_to_quals() too much to have it loop over each RelOptInfo and look at the foreign keys defined on each of those. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
poc_eclass_indexing_v2.patch
Description: Binary data