On 14 March 2017 at 16:37, David Rowley <david.row...@2ndquadrant.com> wrote: > On 14 March 2017 at 11:35, David Rowley <david.row...@2ndquadrant.com> > wrote: >> >> On 14 March 2017 at 07:50, Tom Lane <t...@sss.pgh.pa.us> wrote: >>> >>> [ getting back to this patch finally... ] >>> >>> David Rowley <david.row...@2ndquadrant.com> writes: >>> > I've attached a patch which implements this, though only for >>> > MergeJoin, else I'd imagine we'd also need to ensure all proofs used >>> > for testing the uniqueness were also hash-able too. I added some XXX >>> > comments in analyzejoin.c around the mergeopfamilies == NIL tests to >>> > mention that Merge Join depends on all the unique proof quals having >>> > mergeopfamilies. This also assumes we'll never use some subset of >>> > mergejoin-able quals for a merge join, which could be an interesting >>> > area in the future, as we might have some btree index on a subset of >>> > those columns to provide pre-sorted input. In short, it's a great >>> > optimisation, but I'm a little scared we might break it one day. >>> >>> Umm ... it's broken already isn't it? See the comment for >>> generate_mergejoin_paths: >>> >> Thanks for looking at this again. >> >> Yeah confirmed. It's broken. I guess I just need to remember in the Path >> if we got all the join quals, although I've not looked in detail what the >> best fix is. I imagine it'll require storing something else in the JoinPath. > > > OK, so I've spent some more time on this and I've come up with a solution. > > Basically the solution is to not skip mark and restore when joinquals > contains any items. This is a requirement for SEMI joins, but overly > cautious for unique joins. However, I think it'll apply in most cases when > Merge Join will be a win, and that's when there's a btree index on both > sides of the join which covers all columns in the join condition. I > carefully commented this part of the code to explain what can be done to > have it apply in more cases. > > This caused me to go and change the following code too: > > @@ -2676,6 +2688,9 @@ final_cost_mergejoin(PlannerInfo *root, MergePath > *path, > * it off does not entitle us to deliver an invalid plan. > */ > else if (innersortkeys == NIL && > + !((extra->inner_unique || path->jpath.jointype == JOIN_SEMI) && > + list_length(path->jpath.joinrestrictinfo) == > + list_length(path->path_mergeclauses)) && > !ExecSupportsMarkRestore(inner_path)) > path->materialize_inner = true; > > I've been staring at this for a while and I think it's correct. If it's > wrong then we'd get "ERROR: unrecognized node type: <n>" from ExecMarkPos(). > > Here we must make sure and never skip materializing the inner side, if we're > not going to skip mark and restore in ExecInitMergeJoin(). > > I've attached a patch which fixes the issue. > > Also added a regression test to try to make sure it stays fixed. There's a > few other mostly cosmetic fixes in there too.
Patch is attached which fixes up the conflict between the expression evaluation performance patch. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
unique_joins_2017-03-27.patch
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers