On 2 April 2016 at 05:52, Tom Lane <t...@sss.pgh.pa.us> wrote: > > David Rowley <david.row...@2ndquadrant.com> writes: > > On 12 March 2016 at 11:43, Tom Lane <t...@sss.pgh.pa.us> wrote: > >> It seems like the major intellectual complexity here is to figure out > >> how to detect inner-side-unique at reasonable cost. I see that for > >> LEFT joins you're caching that in the SpecialJoinInfos, which is probably > >> fine. But for INNER joins it looks like you're just doing it over again > >> for every candidate join, and that seems mighty expensive. > > > I have a rough idea for this, but I need to think of it a bit more to > > make sure it's bulletproof. > > Where are we on this? I had left it alone for awhile because I supposed > that you were going to post a new patch, but it's been a couple weeks...
Apologies for the delay. I was fully booked. I worked on this today to try and get it into shape. Changes: * Added unique_rels and non_unique_rels cache to PlannerInfo. These are both arrays of Lists which are indexed by the relid, which is either proved to be unique by, or proved not to be unique by each listed set of relations. Each List item is a Bitmapset containing the relids of the relation which proved the uniqueness, or proved no possibility of uniqueness for the non_unique_rels case. The non_unique_rels cache seems far less useful than the unique one, as with the standard join search, we start at level one, comparing singleton relations and works our way up, so it's only towards the end of the search that we'll have more rels on each side of the join search. Many of the unique cases are found early on in the search, but the non-unique cases must be rechecked as more relations are added to the search, as that introduces a new possibility of uniqueness. In fact, the only cache hit of the non-unique case in the regression tests is a test that's using the GEQO, so I'm not quite sure if the standard join search will ever have cache hit here. Never-the-less I kept the cache, as it's likely going to be most useful to have it when the GEQO is running the show, as that's when we're going to see the largest number of relations to join. There's also a small quirk which could lead to false negatives for uniqueness which might still need ironed out: I don't yet attempt to get the minimum set of relations which proved the uniqueness. This, perhaps, does not matter much for the standard join search, but likely will, more so for GEQO cases. Fixing this will require changes to relation_has_unique_index_for() to get it to record and return the relids of the clauses which matched the index. * Renamed JOIN_LEFT_UNIQUE to JOIN_SEMI_LEFT. It seems better to maintain the SEMI part. I thought about renaming JOIN_SEMI to JOIN_SEMI_INNER, but thought I'd better not touch that here. * Made a pass to update comments in the areas where I've added handling for JOIN_SEMI_LEFT. * I also updated comments in the regression tests to remove reference to unique joins. "Join conversion" seems to be a better term now. There was also a couple of places where I wasn't quite sure how to handle JOIN_SEMI_LEFT. I've marked these with an XXX comment. Perhaps how to handle them is more clear to you. I also was not quite sure the best place to allocate memory for these caches. I've ended up doing that in setup_simple_rel_arrays(), which is likely the wrong spot. I originally did this in standard_join_search(), after join_rel_level is allocated memory, but I soon realised that it can't happen here as the GEQO requires the caches too, and naturally it does not come through standard_join_search(). I was not quite sure if I should update any docs to mention that Inner Joins can become Semi Joins in some cases. I was also a bit unsure if I should move the two new functions I added to joinpath.c into analyzejoins.c. Thanks for taking an interest in this. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Description: Binary data
-- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers