Re: [HACKERS] Performance improvement for joins where outer side is unique

David Rowley Mon, 30 Jan 2017 16:10:53 -0800

On 28 January 2017 at 05:44, Tom Lane <t...@sss.pgh.pa.us> wrote:
> I wrote:
>> David Rowley <david.row...@2ndquadrant.com> writes:
>>> hmm. I'm having trouble understanding why this is a problem for Unique
>>> joins, but not for join removal?
>
>> Ah, you know what, that's just mistaken.  I was thinking that we
>> short-circuited the join on the strength of the hash (or merge) quals
>> only, but actually we check all the joinquals first.  As long as the
>> uniqueness proof uses only joinquals and not conditions that will end up
>> as otherquals, it's fine.
>
> Actually, after thinking about that some more, it seems to me that there
> is a performance (not correctness) issue here: suppose that we have
> something like
>
>         select ... from t1 left join t2 on t1.x = t2.x and t1.y < t2.y
>
> If there's a unique index on t2.x, we'll be able to mark the join
> inner-unique.  However, short-circuiting would only occur after
> finding a row that passes both joinquals.  If the y condition is
> true for only a few rows, this would pretty nearly disable the
> optimization.  Ideally we would short-circuit after testing the x
> condition only, but there's no provision for that.


I've attached a patch which implements this, though only for
MergeJoin, else I'd imagine we'd also need to ensure all proofs used
for testing the uniqueness were also hash-able too. I added some XXX
comments in analyzejoin.c around the mergeopfamilies == NIL tests to
mention that Merge Join depends on all the unique proof quals having
mergeopfamilies.  This also assumes we'll never use some subset of
mergejoin-able quals for a merge join, which could be an interesting
area in the future, as we might have some btree index on a subset of
those columns to provide pre-sorted input. In short, it's a great
optimisation, but I'm a little scared we might break it one day.

Implementing this meant removing the match_first_row_only being set
for JOIN_SEMI, as this optimisation only applies to unique_inner and
not JOIN_SEMI alone. This also means we should be checking for unique
properties on JOIN_SEMI now too, which I've enabled.

Also, I've removed all jointype checks from innerrel_is_unique(). I
took a bit of time to think about JOIN_UNIQUE_OUTER and
JOIN_UNIQUE_INNER. I think these are fine too, in fact one of the
regression test plans moved away from using a Semi Join due to proving
that the inner side was unique. That could perhaps allow a better
plan, since the join order can be swapped. I'd really like someone
else to have a think about that too, just to make sure I've not
blundered that.

I've put the join removal code back to the way it was before. As I
mentioned, we can't use the caches here since we're also using
additional quals for proofs in this case.

I wasn't sure if I should add some regression tests which exercises
MergeJoin a bit to test the new optimisation. Any thoughts on that?

David

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

unique_joins_2017-01-31.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance improvement for joins where outer side is unique

Reply via email to