On 3/16/17 11:54 AM, David Steele wrote:
On 2/14/17 4:03 PM, Tom Lane wrote:
Jim Nasby <jim.na...@bluetreble.com> writes:
On 2/14/17 1:18 PM, Tom Lane wrote:
One point that could use further review is whether the de-duplication
algorithm is actually correct.  I'm only about 95% convinced by the
argument I wrote in planunionor.c's header comment.

I'll put some thought into it and see if I can find any holes. Are you
only worried about the removal of "useless" rels or is there more?

Well, the key point is whether it's really OK to de-dup on the basis
of only the CTIDs that are not eliminated in any UNION arm.  I was
feeling fairly good about that until I thought of the full-join-to-
left-join-to-no-join conversion issue mentioned in the comment.
Now I'm wondering if there are other holes; or maybe I'm wrong about
that one and it's not necessary to be afraid of full joins.

This patch applies cleanly (with offsets) and compiles at cccbdde.

Jim, have you had time to think about this?  Any insights?

Having thought about it, I share Tom's concerns. And now I'm worried about what happens if there are multiple separate OR clauses. I guess those would be handled by separate UNIONs?

I'm also finding it a bit hard to follow the comment Tom mentioned. I'm pretty sure I understand what's going on until this part:

The identical proof can be expected to apply
+  * in other arms, except in an arm that references that rel in its version
+  * of the OR clause.  But in such an arm, we have effectively added a
+  * restriction clause to what is known in other arms, which means that the
+  * set of rows output by that rel can't increase compared to other arms.

AIUI, this is describing a case something like this:

SELECT child.blah FROM child LEFT JOIN parent USING(parent_id)
  WHERE child.foo AND (child.baz=1 or child.baz=2)

given that parent.parent_id is unique. Except for these concerns, there would need to be a complex OR somewhere in here that sometimes referenced parent and sometimes didn't, such as

  WHERE child.foo AND (child.baz=1 OR parent.foo=3)

But I'm not following the logic here (very possibly because I'm wrong about what I said above):

+  * Therefore the situation in such an arm must be that including the rel
+  * could result in either zero or one output row, rather than exactly one
+  * output row as in other arms.  So we still don't need to consider it for
+  * de-duplication.

I'm definitely not certain about the rest of it.

Tom, could you expand the description some, especially with some examples?
--
Jim Nasby, Chief Data Architect, Austin TX
OpenSCG                 http://OpenSCG.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to