[ getting back to this at long last ]
David Rowley <david.row...@2ndquadrant.com> writes:
> However, having said that, I'm not sure why we'd need outer_unique
> available so we'd know that we could skip mark/restore. I think
> inner_unique is enough for this purpose. Take the comment from
> * outer: (0 ^1 1 2 5 5 5 6 6 7) current tuple: 1
> * inner: (1 ^3 5 5 5 5 6) current tuple: 3
> * Consider the above relations and suppose that the executor has
> * just joined the first outer "5" with the last inner "5". The
> * next step is of course to join the second outer "5" with all
> * the inner "5's". This requires repositioning the inner "cursor"
> * to point at the first inner "5". This is done by "marking" the
> * first inner 5 so we can restore the "cursor" to it before joining
> * with the second outer 5. The access method interface provides
> * routines to mark and restore to a tuple.
> If only one inner "5" can exist (unique_inner), then isn't the inner
> side already in the correct place, and no restore is required?
Hmm ... let me see whether I have my head wrapped around this correctly.
What I had in mind is a different optimization. When the inner is not
unique, we can't know we're done joining the first outer "5" until we
reach the inner "6". At that point, what we normally do is advance the
outer by one row and back up the inner to the mark point (the first inner
"5"). But the only reason to back up is that the new outer row might join
to the same inner rows the previous one did. If we know the outer side is
unique, then those inner rows can't join to the new outer row so no need
to back up. So this requires no real change to the mergejoin algorithm,
we just skip the mark and restore steps.
I think what you're saying is that, if we know the inner side is unique,
we can also skip mark/restore overhead; but it would work a bit
differently. After joining two rows with equal keys, instead of advancing
the inner as per standard algorithm, we'd need to advance the outer side.
(This is OK because we know the next inner can't join to this same outer.
But we don't know if the next outer can join to this inner.) We advance
inner only when current inner < current outer, so we're done with that
inner and need never rewind. So this is a more fundamental algorithm
change but it gets the same performance benefit.
So the question is, if we can skip mark/restore overhead when we know that
either input is unique, is it necessary for the planner to account for
both ways explicitly? Given the general symmetry of mergejoins, it might
be okay for the planner to preferentially generate plans with the unique
input on the inside, and not worry about optimizing in this way when it's
on the outside.
Now, JOIN_SEMI and JOIN_ANTI cases are *not* symmetric, since we don't
implement reverse-semi or reverse-anti join modes. But I don't know that
there's anything to be won by noticing that the outer side is unique in
those cases. I think probably we can apply the same technique of
advancing outer not inner, and never rewinding, in those join modes
whether or not either input is known unique.
Another asymmetry is that if one input is likely to be empty, it's better
to put that one on the outside, because if it is empty we don't have to
fetch anything from the other input (thus saving its startup cost). But
the planner doesn't account for that anyway because it never believes
non-dummy relations are truly empty; so even if it's getting this right
today, it's purely by chance.
I can't think of any other asymmetries offhand.
In short, I think you are right that it might be enough to account for
inner uniqueness only, and not worry about it for the outer side, even
for mergejoin. This means my previous objection is wrong and we don't
really have to allow for a future extension of that kind while choosing
the notation the planner uses.
So ... would you rather go back to the previous notation (extra
JoinTypes), or do you like the separate boolean better anyway?
Sorry for jerking you back and forth like this, but sometimes the
correct path isn't apparent from the start.
regards, tom lane
Sent via pgsql-hackers mailing list (firstname.lastname@example.org)
To make changes to your subscription: