Re: [HACKERS] [PATCH] Overestimated filter cost and its mitigation

2017-11-08 Thread Ashutosh Bapat
On Wed, Nov 8, 2017 at 3:18 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> I think it would be a good idea, as Thomas says, to order the qual
>> clauses at an earlier stage and then remember our decision.  However,
>> we have to think about whether that's going to increase planning time
>> in a noticeable way.  I wonder why we currently postpone this until
>> such a late phase of planning.
>
> Because (1) up to now there's been no need to consider the qual ordering
> till later, and (2) re-doing that sort for each path seemed unduly
> expensive.  If we're to try to estimate whether later quals will be
> reached, then sure the ordering becomes important.  I'm still concerned
> about (2) though.  If there were a way to pre-sort the quals once for
> all paths, the problem goes away, but I don't think that works ---
> indexscans may segregate some quals, and in join cases different paths
> will actually have different sets of quals they need to check depending
> on the join order.
>

Looking at order_qual_clauses(), we can say that a set of quals q1
 qn are ordered the same irrespective of the set of clauses they
are subset of. E.g. if {q1 .. qn} is subset of Q (ordered as Qo) and
also Q' (ordered as Q'o) the order in which they appear in Qo and Q'o
is same. So, even if different paths segregate the clauses in
different set, within each set the order is same. FWIW, we can order
all clauses in largest set once and use that order every time. Albeit
we will have to remember the order somewhere OR make the separator
routine retain the order in the larger set, which I guess is true
about all separator functions.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Overestimated filter cost and its mitigation

2017-11-07 Thread Tom Lane
Robert Haas  writes:
> I think it would be a good idea, as Thomas says, to order the qual
> clauses at an earlier stage and then remember our decision.  However,
> we have to think about whether that's going to increase planning time
> in a noticeable way.  I wonder why we currently postpone this until
> such a late phase of planning.

Because (1) up to now there's been no need to consider the qual ordering
till later, and (2) re-doing that sort for each path seemed unduly
expensive.  If we're to try to estimate whether later quals will be
reached, then sure the ordering becomes important.  I'm still concerned
about (2) though.  If there were a way to pre-sort the quals once for
all paths, the problem goes away, but I don't think that works ---
indexscans may segregate some quals, and in join cases different paths
will actually have different sets of quals they need to check depending
on the join order.

So the bottom line here is how much is it going to cost us to add this
additional refinement in cost estimation, and is it worth it given our
extremely poor level of accuracy in expression cost estimation.  Color
me dubious.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Overestimated filter cost and its mitigation

2017-11-07 Thread Robert Haas
On Mon, Nov 6, 2017 at 5:19 AM, Ashutosh Bapat
 wrote:
> IIRC, only thing that changes between plan time quals and execution
> time quals is constaint folding of constant parameters. But I don't
> think we change the selectivity estimates when that's done. At the
> same time, I don't think we should make a lot of effort to make sure
> that the order used during the estimation is same as the order at the
> execution; we are anyway estimating. There can always be some
> difference between what's estimated and what actually happens.

I don't think that's a good justification for allowing the order to
vary.  It's true that, at execution time, we may find row counts and
selectivities to be different than what we predicted, but that is a
case of the real data being different from our idealized data -- which
is difficult to avoid in general.  Allowing our actual decisions to be
different from the decisions we thought we would make seems like
programmer sloppiness.  It would also be very confusing if the planner
uses one ordering to estimate the cost and then a different order at
execution time and in the EXPLAIN ANALYZE output.  How could anybody
understand what had happened in such a case?

I think it would be a good idea, as Thomas says, to order the qual
clauses at an earlier stage and then remember our decision.  However,
we have to think about whether that's going to increase planning time
in a noticeable way.  I wonder why we currently postpone this until
such a late phase of planning.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Overestimated filter cost and its mitigation

2017-11-06 Thread Ashutosh Bapat
On Mon, Nov 6, 2017 at 10:01 AM, Thomas Munro
 wrote:
>
> This idea seems to makes intuitive sense.  I see that you use
> order_qual_clauses() to know what order they'll run in, so I'm
> wondering if there is any reason we shouldn't do it up front and keep
> it during path building, instead of running it again at plan creation
> time.  Is there some way it could ever produce a different result at
> the two times?

IIRC, only thing that changes between plan time quals and execution
time quals is constaint folding of constant parameters. But I don't
think we change the selectivity estimates when that's done. At the
same time, I don't think we should make a lot of effort to make sure
that the order used during the estimation is same as the order at the
execution; we are anyway estimating. There can always be some
difference between what's estimated and what actually happens.

> Why not also apply this logic to qpquals of joins,
> foreign scans, subplans?  That is, instead of replacing cost_qual_eval
> with this code for baserels, I wonder if we should teach
> cost_qual_eval how to do this so those other users could also benefit
> (having perhaps already ordered the qual clauses earlier).

+1.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Overestimated filter cost and its mitigation

2017-11-05 Thread Thomas Munro
On Mon, Sep 11, 2017 at 7:43 PM, Yuto Hayamizu  wrote:
> Suppose that there are three qual clauses in a scan node, current
> postgres estimates per-tuple cost of the filter to be:
>cost(A) + cost(B) + cost(C)
>
> And basic idea of the attached patch is:
>cost(A) + clauselist_selectivity({A}) * cost(B) +
> clauselist_selectivity({A, B}) * cost(C)

I am no planner expert and I haven't tested or studied the patch in
detail, but here is some feedback for what it's worth.

This idea seems to makes intuitive sense.  I see that you use
order_qual_clauses() to know what order they'll run in, so I'm
wondering if there is any reason we shouldn't do it up front and keep
it during path building, instead of running it again at plan creation
time.  Is there some way it could ever produce a different result at
the two times?  Why not also apply this logic to qpquals of joins,
foreign scans, subplans?  That is, instead of replacing cost_qual_eval
with this code for baserels, I wonder if we should teach
cost_qual_eval how to do this so those other users could also benefit
(having perhaps already ordered the qual clauses earlier).

This is one of those patches that risks having an impact on many query
plans.  Yikes.  Of all the regression test queries, only
updatable_views complained though, and that involves leakproof
functions.  I see that those get some kind of special treatment in
order_qual_clauses().

+ ordered_clauses = order_qual_clauses(root, rel->baserestrictinfo);
+ clause_list = ordered_clauses;

Is clause_list necessary?  Can't you just use ordered_clauses for the
outer and inner loops?

+ List *clause_list_for_sel = NULL;

The convention is to use NIL for empty lists (a nod to the Lisp
machine they prototyped this project on).

+ /* Make a temporary clause list for selectivity calcuation */

s/calcuation/calculation/

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Overestimated filter cost and its mitigation

2017-09-17 Thread David Fetter
On Mon, Sep 11, 2017 at 04:43:46PM +0900, Yuto Hayamizu wrote:
> Hi hackers,
> 
> Currently, cost of a filter with multiple clauses is estimated by
> summing up estimated cost of each clause.  As long as a filter
> consists of simple clauses and its cost is fairly small, it works
> fine. However, when there exists some heavy clauses (like SubQuery or
> user-defined functions) and most of tuples are filtered by other
> simple clauses, total cost is likely to be overestimated.  An attached
> patch mitigates this overestimation by using selectivity estimation of
> subsets of clauses in a filter.

I've taken the liberty of adding this to the upcoming commitfest.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers