On 6 December 2015 at 06:07, Tom Lane <t...@sss.pgh.pa.us> wrote:

> David Rowley <david.row...@2ndquadrant.com> writes:
> > As of today these Equivalence Classes only incorporate expressions which
> > use the equality operator, but what if the above query had been written
> as:
>
> > select * from t1 inner join t2 on t1.id = t2.id where t1.id <= 10;
>
> > Should we not be able to assume that t2.id is also <= 10?



This sort of thing has been suggested multiple times before, and I've
> rejected it every time on the grounds that it would likely add far more
> planner cycles than it'd be worth, eg, time would be spent on searches for
> matching subexpressions whether or not anything was learned (and often
> nothing would be learned).


I guess if it keeps coming up then people must be hitting this limitation.
Perhaps continually rejecting it based on your original opinion is not the
best way to move forward with improving the query planner.


>   While I've only read your description of the
> patch not the patch itself, the search methodology you propose seems
> pretty brute-force and unlikely to solve that issue.  It's particularly
> important to avoid O(N^2) behaviors when there are N expressions ...
>
>

Likely it would be possible to do something to improve on that.
Perhaps if the list of filter clauses grows beyond a certain number then a
hash table can be built on the ec_members list with the em_expr as the key
and the EquivalenceClass as the data. Although likely we don't currently
have anything available for generating hash values for Expr nodes. On
checking it seems that 32 is the estimated tipping point for hashing
in find_join_rel(), perhaps something similar would suit this.


> Another issue that would need consideration is how to avoid skewing
> planner selectivity estimates with derived clauses that are fully
> redundant with other clauses.  The existing EC machinery is mostly
> able to dodge that problem by generating just a minimal set of equality
> clauses from an EC, but I don't see how we'd make that work here.
>
>
Could you give me an example of where this is a problem? I've tried fooling
the planner into giving me bad estimates, but I've not managed to.

# explain analyze select * from t1 where t1.id < 10 and t1.id < 100 and
t1.id < 1000;
                                                 QUERY PLAN
------------------------------------------------------------------------------------------------------------
 Index Scan using t1_pkey on t1  (cost=0.29..8.49 rows=9 width=8) (actual
time=0.155..0.160 rows=9 loops=1)
   Index Cond: ((id < 10) AND (id < 100) AND (id < 1000))

I'd say the t1.id < 100 AND t1.id < 1000 are fully redundant here. The
estimate given by the planner is bang-on.

Perhaps you're talking about another column making the pushed down qual
redundant?  if so, is the current eclass code not prone to the exact same
problem?

I'm also wondering why you want to limit it to comparisons to constants;
> that seems rather arbitrary.
>
>
Do you have other useful cases in mind?
The other one I considered was "expr op expr" clauses, where both those
exprs were in eclasses. For example:

select * from t1 inner join t2 on t1.col1=t2.col1 and t1.col2=t2.col2 where
t1.col1 < t1.col2;

I'd imagine that would have a narrower use case. I simply stopped with
"Expr op Const" because I though that would be more common and less planner
overhead to detect. Giving that below you also raise concerns that "expr op
const" is likely not that useful in enough cases, then I'm not holding up
too much hope of adding more complexity to the detection mechanism for less
common wins.


> Lastly, in most cases knowing that t2.id <= 10 is just not worth all
> that much; it's certainly far less useful than an equality condition.
> It's not difficult to imagine that this would often be a net drag on
> runtime performance (never mind planner performance) by doing nothing
> except creating additional filter conditions the executor has to check.
> Certainly it would be valuable to know this if it let us exclude some
> partition of a table, but that's only going to happen in a small minority
> of queries.
>
>
I'd have thought that my link to a thread of a reported 1100 to 2200 times
performance improvement might have made you think twice about claiming the
uselessness of this idea.

I'm also quite surprised at you mentioning partitions here. Personally I'd
be thinking of indexes long before thinking of partitions. I don't think I
need to remind you that btree indexes handle >= and <= just fine. It's not
all that hard to imagine that if they're using that column for a join
condition and are serious about performance that they just might also have
an index defined on that column too. I'd imagine the only case we might
have to worry about is when the selectivity of the pushed qual is getting
close to 1. Perhaps the RestrictInfos could be marked as "optional"
somehow, and we could simply remove them when their selectivity estimates
are too high.

I'm not necessarily opposed to doing anything in this area, but so far
> I've not seen how to do it in a way that is attractive when planner
> complexity, performance, and end results are all considered.
>

Glad to hear it! Otherwise it would be a real shame to miss out on such
great wins during execution time.

--
 David Rowley                   http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
 PostgreSQL Development, 24x7 Support, Training & Services

Reply via email to