On Tue, 25 Dec 2018 at 13:46, Tomas Vondra <tomas.von...@2ndquadrant.com> wrote: > I however observe failures on 4 regression test suites - inherit, > equivclass, partition_join and partition_prune (diff attached). That's a > bit surprising, because AFAICS the patch merely optimizes the execution > and should not change the planning otherwise (all the failures seem to > be a query plan changing in some way). I'm not sure if the plans are > correct or better than the old ones.
Seems I didn't run the tests after doing a last-minute move of the create_eclass_index() call in make_one_rel(). I'd previously had it just below set_base_rel_pathlists(), which meant that the index didn't exist in some cases so it would fall back on the original code. It appears the use case call from set_base_rel_pathlists() require is_child eclass members to be indexed too, but these were not in v1 or v2 since ec_relids does not record child rel members. It seems simple enough to fix this just by adding ec_allrelids and setting it for all members rather than just non-children members. This also allows me to add the two additional cases to allow generate_implied_equalities_for_column() and has_relevant_eclass_joinclause() to also make use of the eclass index. This further reduces the total planning time for the query on my machine to 2304 ms. > The other thing is that we probably should not use a single test case to > measure the optimization - I doubt it can improve less extreme queries, > but maybe we should verify it does not regress them? Agreed. That still needs to be verified. Although I'm yet unsure what sort of form we could use this idea in. I wonder if it's fine to create this eclass index on the fly, or if it's something we should keep maintained all the time. The problem I saw with keeping it all the time is down to eq_classes being a List and list_nth() is slow, which means the bms_next_member() loops I've added would perform poorly when compiled with a linked list lookup. > With the patch attached, bms_overlap gets quite high in the profiles. I > think one reason for that is that all bitmap operations (not just > bms_overlap) always start the loop at 0. For the upper boundary is > usually determined as Min() of the lengths, but we don't do that for > lower boundary because we don't track that. The bitmaps for eclasses are > likely sparse, so this is quite painful. Attaches is a simple patch that > adds tracking of "minword" and uses it in various bms_ methods to skip > initial part of the loop. On my machine this reduces the timings by > roughly 5% (from 1610 to 1530 ms). Seems like an interesting idea, although for the most part, Bitmapsets are small and this adds some overhead to all use cases. I doubt it is worth the trouble for bms_is_member(), since that does not need to perform a loop over each bitmapword. It looks like most of the bms_overlap() usages are caused by functions like have_join_order_restriction(), join_is_legal(), check_outerjoin_delay() and add_paths_to_joinrel(), all of which are performing a loop over the join_info_list. Perhaps that could be indexed a similar way to the eq_classes List. I'd imagine there's about another 20-30% of performance to squeeze out of this by doing that, plus a bit more by making the fkey_list per RelOptInfo. Attached is v3 of the hacked up proof of concept performance demo patch. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
poc_eclass_indexing_v3.patch
Description: Binary data