On 14 March 2017 at 07:50, Tom Lane <t...@sss.pgh.pa.us> wrote: > [ getting back to this patch finally... ] > > David Rowley <david.row...@2ndquadrant.com> writes: > > I've attached a patch which implements this, though only for > > MergeJoin, else I'd imagine we'd also need to ensure all proofs used > > for testing the uniqueness were also hash-able too. I added some XXX > > comments in analyzejoin.c around the mergeopfamilies == NIL tests to > > mention that Merge Join depends on all the unique proof quals having > > mergeopfamilies. This also assumes we'll never use some subset of > > mergejoin-able quals for a merge join, which could be an interesting > > area in the future, as we might have some btree index on a subset of > > those columns to provide pre-sorted input. In short, it's a great > > optimisation, but I'm a little scared we might break it one day. > > Umm ... it's broken already isn't it? See the comment for > generate_mergejoin_paths: > > * We generate mergejoins if mergejoin clauses are available. We have > * two ways to generate the inner path for a mergejoin: sort the cheapest > * inner path, or use an inner path that is already suitably ordered for > the > * merge. If we have several mergeclauses, it could be that there is no > inner > * path (or only a very expensive one) for the full list of mergeclauses, > but > * better paths exist if we truncate the mergeclause list (thereby > discarding > * some sort key requirements). So, we consider truncations of the > * mergeclause list as well as the full list. (Ideally we'd consider all > * subsets of the mergeclause list, but that seems way too expensive.) > > There's another, more subtle, way in which it could fail in > sort_inner_and_outer(): > > * Each possible ordering of the available mergejoin clauses will generate > * a differently-sorted result path at essentially the same cost. We have > * no basis for choosing one over another at this level of joining, but > * some sort orders may be more useful than others for higher-level > * mergejoins, so it's worth considering multiple orderings. > * > * Actually, it's not quite true that every mergeclause ordering will > * generate a different path order, because some of the clauses may be > * partially redundant (refer to the same EquivalenceClasses). Therefore, > * what we do is convert the mergeclause list to a list of canonical > * pathkeys, and then consider different orderings of the pathkeys. > > I'm fairly sure that it's possible to end up with fewer pathkeys than > there are mergeclauses in this code path. Now, you might be all right > anyway given that the mergeclauses must refer to the same ECs in such a > case --- maybe they're fully redundant and we can take testing the > included clause as proving the omitted one(s) too. I'm not certain > right now what I meant by "partially redundant" in this comment. > But in any case, it's moot for the present purpose because > generate_mergejoin_paths certainly breaks your assumption. >
Thanks for looking at this again. Yeah confirmed. It's broken. I guess I just need to remember in the Path if we got all the join quals, although I've not looked in detail what the best fix is. I imagine it'll require storing something else in the JoinPath. Here's my test case: select 'create tablespace slow_disk LOCATION ''' || current_setting('data_directory') || ''' WITH (random_page_cost=1000);'; \gexec create table ab (a int not null, b int not null); create unique index ab_pkey on ab (a,b) tablespace slow_disk; alter table ab add constraint ab_pkey primary key using index ab_pkey; create index ab_a_idx on ab (a); insert into ab values(1,1); insert into ab values(1,2); set enable_nestloop to 0; set enable_hashjoin to 0; set enable_sort to 0; explain verbose select * from ab ab1 inner join ab ab2 on ab1.a = ab2.a and ab1.b = ab2.b; QUERY PLAN ------------------------------------------------------------------------------------- Merge Join (cost=0.26..24.35 rows=1 width=16) Output: ab1.a, ab1.b, ab2.a, ab2.b Inner Unique: Yes Merge Cond: (ab1.a = ab2.a) Join Filter: (ab1.b = ab2.b) -> Index Scan using ab_a_idx on public.ab ab1 (cost=0.13..12.16 rows=2 width=8) Output: ab1.a, ab1.b -> Index Scan using ab_a_idx on public.ab ab2 (cost=0.13..12.16 rows=2 width=8) Output: ab2.a, ab2.b (9 rows) select * from ab ab1 inner join ab ab2 on ab1.a = ab2.a and ab1.b = ab2.b; -- wrong results a | b | a | b ---+---+---+--- 1 | 2 | 1 | 2 (1 row) drop index ab_a_idx; -- same query again. This time it'll use the PK index on the slow_disk tablespace select * from ab ab1 inner join ab ab2 on ab1.a = ab2.a and ab1.b = ab2.b; a | b | a | b ---+---+---+--- 1 | 1 | 1 | 1 1 | 2 | 1 | 2 (2 rows) I had to fix up some conflicts before testing this again on a recent master, so I've attached my rebased version. No fix yet for the above issue -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
unique_joins_2017-03-14.patch
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers