[ getting back to this at long last ] David Rowley <david.row...@2ndquadrant.com> writes: > However, having said that, I'm not sure why we'd need outer_unique > available so we'd know that we could skip mark/restore. I think > inner_unique is enough for this purpose. Take the comment from > nodeMergejoin.c:
> * outer: (0 ^1 1 2 5 5 5 6 6 7) current tuple: 1 > * inner: (1 ^3 5 5 5 5 6) current tuple: 3 > ... > * > * Consider the above relations and suppose that the executor has > * just joined the first outer "5" with the last inner "5". The > * next step is of course to join the second outer "5" with all > * the inner "5's". This requires repositioning the inner "cursor" > * to point at the first inner "5". This is done by "marking" the > * first inner 5 so we can restore the "cursor" to it before joining > * with the second outer 5. The access method interface provides > * routines to mark and restore to a tuple. > If only one inner "5" can exist (unique_inner), then isn't the inner > side already in the correct place, and no restore is required? Hmm ... let me see whether I have my head wrapped around this correctly. What I had in mind is a different optimization. When the inner is not unique, we can't know we're done joining the first outer "5" until we reach the inner "6". At that point, what we normally do is advance the outer by one row and back up the inner to the mark point (the first inner "5"). But the only reason to back up is that the new outer row might join to the same inner rows the previous one did. If we know the outer side is unique, then those inner rows can't join to the new outer row so no need to back up. So this requires no real change to the mergejoin algorithm, we just skip the mark and restore steps. I think what you're saying is that, if we know the inner side is unique, we can also skip mark/restore overhead; but it would work a bit differently. After joining two rows with equal keys, instead of advancing the inner as per standard algorithm, we'd need to advance the outer side. (This is OK because we know the next inner can't join to this same outer. But we don't know if the next outer can join to this inner.) We advance inner only when current inner < current outer, so we're done with that inner and need never rewind. So this is a more fundamental algorithm change but it gets the same performance benefit. So the question is, if we can skip mark/restore overhead when we know that either input is unique, is it necessary for the planner to account for both ways explicitly? Given the general symmetry of mergejoins, it might be okay for the planner to preferentially generate plans with the unique input on the inside, and not worry about optimizing in this way when it's on the outside. Now, JOIN_SEMI and JOIN_ANTI cases are *not* symmetric, since we don't implement reverse-semi or reverse-anti join modes. But I don't know that there's anything to be won by noticing that the outer side is unique in those cases. I think probably we can apply the same technique of advancing outer not inner, and never rewinding, in those join modes whether or not either input is known unique. Another asymmetry is that if one input is likely to be empty, it's better to put that one on the outside, because if it is empty we don't have to fetch anything from the other input (thus saving its startup cost). But the planner doesn't account for that anyway because it never believes non-dummy relations are truly empty; so even if it's getting this right today, it's purely by chance. I can't think of any other asymmetries offhand. In short, I think you are right that it might be enough to account for inner uniqueness only, and not worry about it for the outer side, even for mergejoin. This means my previous objection is wrong and we don't really have to allow for a future extension of that kind while choosing the notation the planner uses. So ... would you rather go back to the previous notation (extra JoinTypes), or do you like the separate boolean better anyway? Sorry for jerking you back and forth like this, but sometimes the correct path isn't apparent from the start. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers