On Mon, Dec 16, 2013 at 6:59 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Robert Haas <robertmh...@gmail.com> writes:
>> On Mon, Dec 16, 2013 at 2:04 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>>> The problem is that that optimization is a crock; see the comments
>>> for create_or_index_quals().  We can't just turn it loose to CNF-ify
>>> every OR it might find.  The case that we support at the moment is
>>> to CNF-ify whichever single OR condition looks like the best win,
>>> and it's hard to see how to do that without any index knowledge.
>
>> Could we get by without actually converting to CNF?
>
> The hard part is not extracting the partial qual.  The hard part is
> trying to make sure that adding this entirely-redundant scan qual doesn't
> catastrophically degrade join size estimates.

OK, I had a feeling that's where the problem was likely to be.  Do you
have any thoughts about a more principled way of solving this problem?

I mean, off-hand, it's not clear to me that the comments about this
being a MAJOR HACK aren't overstated.  I mean, if we expect a join
qual for {X Y} to have a given selectivity, but then we pre-filter X
with a clause that is deliberately redundant with that join qual, then
the join qual does indeed become less selective when view as applying
to the surviving rows.  This does not strike me as very much different
from the oft-encountered problem of estimating selectivity for a = 1
AND b = 1, where a and b are correlated.  Whichever qual we apply
first changes the data distribution such that the selectivity of the
second qual is not the same as it would have been when applied to the
entirety of the data.  Estimating it that way would not be a hack; it
would be reality.

Now, it might be true that frobbing what is intended as a cache based
on the knowledge that the cache will never be flushed is a hack.

> The hack of making an
> inverse adjustment to the original OR clause's selectivity works, more or
> less, for a single join OR condition.  I don't think it works if there's
> several modified OR conditions (possibly covering different sets of
> relations).

I might be missing something, but I suspect it works fine if every
path for the relation is generating the same rows.  The partial qual
is definitely going to be applied before the join qual, so the join
qual will surely be hitting only data that's been pre-filtered by the
partial qual, and so its selectivity will be correspondingly more.
Where it seems to me that you'd run in to trouble is if we created one
path that doesn't bother enforcing the partial qual (relying on the
fact that it will be applied post-join) and another path that does.
Now you've got the same selectivity estimate for the join in two cases
where the incoming data distribution is significantly different, which
is bad.

Do you see another danger?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to