On 28 January 2017 at 05:44, Tom Lane <t...@sss.pgh.pa.us> wrote: > I wrote: >> David Rowley <david.row...@2ndquadrant.com> writes: >>> hmm. I'm having trouble understanding why this is a problem for Unique >>> joins, but not for join removal? > >> Ah, you know what, that's just mistaken. I was thinking that we >> short-circuited the join on the strength of the hash (or merge) quals >> only, but actually we check all the joinquals first. As long as the >> uniqueness proof uses only joinquals and not conditions that will end up >> as otherquals, it's fine. > > Actually, after thinking about that some more, it seems to me that there > is a performance (not correctness) issue here: suppose that we have > something like > > select ... from t1 left join t2 on t1.x = t2.x and t1.y < t2.y > > If there's a unique index on t2.x, we'll be able to mark the join > inner-unique. However, short-circuiting would only occur after > finding a row that passes both joinquals. If the y condition is > true for only a few rows, this would pretty nearly disable the > optimization. Ideally we would short-circuit after testing the x > condition only, but there's no provision for that.
I've attached a patch which implements this, though only for MergeJoin, else I'd imagine we'd also need to ensure all proofs used for testing the uniqueness were also hash-able too. I added some XXX comments in analyzejoin.c around the mergeopfamilies == NIL tests to mention that Merge Join depends on all the unique proof quals having mergeopfamilies. This also assumes we'll never use some subset of mergejoin-able quals for a merge join, which could be an interesting area in the future, as we might have some btree index on a subset of those columns to provide pre-sorted input. In short, it's a great optimisation, but I'm a little scared we might break it one day. Implementing this meant removing the match_first_row_only being set for JOIN_SEMI, as this optimisation only applies to unique_inner and not JOIN_SEMI alone. This also means we should be checking for unique properties on JOIN_SEMI now too, which I've enabled. Also, I've removed all jointype checks from innerrel_is_unique(). I took a bit of time to think about JOIN_UNIQUE_OUTER and JOIN_UNIQUE_INNER. I think these are fine too, in fact one of the regression test plans moved away from using a Semi Join due to proving that the inner side was unique. That could perhaps allow a better plan, since the join order can be swapped. I'd really like someone else to have a think about that too, just to make sure I've not blundered that. I've put the join removal code back to the way it was before. As I mentioned, we can't use the caches here since we're also using additional quals for proofs in this case. I wasn't sure if I should add some regression tests which exercises MergeJoin a bit to test the new optimisation. Any thoughts on that? David -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Description: Binary data
-- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers