Re: [HACKERS] propose to pushdown qual into EXCEPT

Tom Lane Fri, 23 Dec 2016 16:09:07 -0800

Alvaro Herrera <[email protected]> writes:
> Tom Lane wrote:
>> That is not an adequate argument for such a change being okay.  Postgres,
>> with its extensible set of datatypes, has to be much more careful about
>> the semantic soundness of optimizations than some other DBs do.


> Can we use the record_image_ops mechanism here?

Don't see how that helps?

But I went through the logic and think I convinced myself that pushdown
to LHS only is okay, and no more questionable than what happens for
INTERSECT.  Argue thus:

Assume a group of non-distinct-according-to-the-setop rows includes m rows
from left-hand side, n rows from right-hand side, of which the upper qual
would pass m' <= m and n' <= n rows respectively.

UNION: unoptimized setop produces 1 row (since we may assume the group is
nonempty).  That row might or might not pass the upper qual, so we get
0 or 1 row out.  User can be certain about what will happen only if
m'=m and n'=n (row must pass qual) or m'=n'=0 (row must not pass qual).

After optimizing by pushing qual down: we get one row if m' > 0 or n' > 0.
This is equivalent to the unoptimized behavior if we assume that the setop
always magically selects a representative row passing the qual if
possible, so OK.

INTERSECT: unoptimized setop produces 1 row if m>0 and n>0, which again
might or might not pass the upper qual, so 0 or 1 row out.

Optimized case: we get 1 row if m' > 0 and n' > 0.  This is equivalent to
unoptimized behavior if you allow that the representative row could have
been taken from either input.  (For example, if m' = m, n > 0, and n' = 0,
producing 0 rows is legit only if the hypothetical representative row was
taken from RHS.)  That's legal per SQL, though I'm unsure offhand if our
implementation actually would do that.

INTERSECT ALL: unoptimized setop produces min(m,n) rows, which might or
might not pass the upper qual, so anywhere from 0 to min(m,n) rows out.
(It is probably possible to set a tighter upper bound when m', n' are
small, but I've not bothered to reason that out.)

Optimized case: we get min(m', n') rows out, which seems fine; again
it could be produced by the unoptimized implementation selecting
representative row(s) that happen to have that many members passing the
upper qual.

But actually, our implementation doesn't produce some random collection
of min(m,n) elements of the row group.  What you get is min(m,n) copies
of a single representative row; therefore, assuming the upper qual is
nonvolatile, either they all pass or none do, so the actual output is
either 0 or min(m,n) duplicates.  (The spec says "Let R be a row that is a
duplicate of some row in ET1 or of some row in ET2 or both.  Let m be the
number of duplicates of R in ET1 and let n be the number of duplicates of
R in ET2, where m >= 0 and n >= 0.  [For INTERSECT ALL] the number of
duplicates of R that T contains is the minimum of m and n."  So depending
on what you want to assume that "duplicates of R" means, you could argue
that the spec actually requires our implementation's behavior here.
It certainly doesn't appear to forbid it.)  The optimized output is
therefore distinguishable from unoptimized, because it could produce a
number of rows that the unoptimized implementation cannot.  But nobody's
complained about it, and it's hard to see there being a legitimate beef.

EXCEPT: unoptimized setop produces 1 row if m>0 and n=0, which again
might or might not pass the upper qual, so 0 or 1 row out.

If we were to push qual into both sides: we'd get one row if m' > 0
and n' = 0.  So if the qual passes some of LHS and none of RHS, we'd
get a row, breaking SQL semantics.

If we push into LHS only: we get one row if m' > 0 and n = 0.  Certainly
then m > 0.  So this is equivalent to behavior where the setop magically
selects a representative LHS row passing the qual.  OTOH, if m' = 0, then
the setop wouldn't produce a row, but if it had that row would have failed
the upper qual, so behavior is still equivalent.

EXCEPT ALL: unoptimized setop produces max(m-n,0) rows, which might or
might not pass the upper qual, so anywhere from 0 to max(m-n,0) rows out.
(Again it is probably possible to set a tighter upper bound when m', n'
are small.)

Again, pushing into RHS would allow n to be reduced, perhaps allowing more
rows to be returned than SQL would allow, so we can't do that.

If we push into LHS only: we get max(m'-n,0) rows.  Again, it would be
possible for the unoptimized setop to select its representative rows in
such a way that this happens (it just has to prefer LHS rows passing the
qual over those that don't).

Again, this doesn't quite match reality of the current setop
implementation, since it returns max(m-n,0) identical rows, not that
many random members of the current group.  But it's still hard to believe
anyone would have a legitimate beef about it.


tl;dr: I now think what the patch proposes to do is legit.  There's a heck
of a lot of work to be done on the comments it's falsified, though.

                        regards, tom lane


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] propose to pushdown qual into EXCEPT

Reply via email to