On 3/19/17 2:32 PM, Tom Lane wrote:
Jim Nasby <jim.na...@openscg.com> writes:
Having thought about it, I share Tom's concerns. And now I'm worried
about what happens if there are multiple separate OR clauses. I guess
those would be handled by separate UNIONs?

As proposed, the patch would try to optimize by splitting each OR clause
independently, and would choose whichever way gave the best cost estimate.
I'm not sure it's possible to do better than that, and even if it is,
I think improving it could be left for later.

Agreed.

I'd also considered an approach of de-duping on the basis of all relation
ctids, while allowing a rel's ctid to be returned as NULL from a UNION arm
in which the rel was eliminated entirely.  But that doesn't fix it,
because in this example the first arm would return (a.ctid, NULL) while
the second arm would return (NULL, b.ctid), so that the UNION step would
still fail to detect any duplication.  To make it work, we'd have to not
eliminate the joins, which would pretty much defeat the usefulness of
the optimization for your original example case.

It might still be worth-while in some circumstances. In your example, if there were these indexes:

a__id ON a(id), a__x ON a(x)
b__id ON b(id), b__y ON b(y)

then it might be faster to nested loop a__x=42 to b__id=a.id and union that with b__y=43 nested to a__id=b.id.

That said, now isn't the time to be adding more complexity.

So full joins definitely break this whole optimization.  I think it's okay
with left joins though, because when starting from "a left join b" it will
never be possible to remove "a" so we'll always include a.ctid in the
UNION de-duping step.  If b is removed in some arm, then it must be true
that we get exactly one left-join output row per a row, regardless of the
contents of b, in that arm.  The argument for the patch being okay is
essentially that we must get exactly one left-join output row per a row,
regardless of the contents of b, in *every* arm, because the various
modified versions of the OR clause can't affect that conclusion.  In some
of the arms we might not remove b, and we might even be able to reduce the
left join to an inner join, but there should still be no more than one
join output row per a row.  That being the case, it should be sufficient
to de-dup using a.ctid while ignoring b.ctid.

The only case I can think of is: would it be possible (perhaps not today; maybe in the future) for other parts of the query to affect join elimination? I can't conceive of how that might happen, but if it did then it's possible that the elimination would work differently with the UNION than it would with an OR.

The comment on join_is_removable() does mention that there's other potentially interesting cases that we can't handle now; it's maybe worth mentioning

Other than that, I can't see any issues with your logic.

Any clearer yet?

Absolutely. I think it would be very valuable to include that with the initial comment in planunionor.c. Join reduction and removal is already tricky enough to wrap your head around.

Other comments:

+  * is retty mechanical, but we can't do it until we have a RelOptInfo for the

s/retty/pretty/


I suspect that in many systems single-table queries are far more common than CTEs, so maybe it's worth reversing those two tests in split_join_or_clauses().


For the Unique path, it would be nice if the code did what would be necessary to consider a TID hash join, since that means a user could create the appropriate operator and it would just be picked up. Certainly not worth much effort at this point though.

+       /*
+        * Must not have any volatile functions in FROM or WHERE (see notes at
+        * head of file).
+        */
+       if (contain_volatile_functions((Node *) parse->jointree))

Is there by chance anywhere else that needs to check that? Maybe worth adding the info to the Query struct if so.

+        * We insist that all baserels used in the query be plain relations, so

Dumb question... views have already be decomposed at this point, right?

Perhaps less dumb question... what happens if the original query already had setOps? AIUI setOps work has already been done by the time split_join_or_clauses() happens; I just want to check that that's OK.

I'm not sure a GUC is worth it... I suspect that any query with multiple rels and an OR condition is going to be expensive enough that whatever additional plan time there is won't be noticeable.

I've verified that the patch still applies and make check-world is clean.
--
Jim Nasby, Chief Data Architect, Austin TX
OpenSCG                 http://OpenSCG.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to