subject:"\[HACKERS\] \[PATCH\] Overestimated filter cost and its mitigation"

Re: [HACKERS] [PATCH] Overestimated filter cost and its mitigation

2017-11-08 Thread Ashutosh Bapat

On Wed, Nov 8, 2017 at 3:18 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> I think it would be a good idea, as Thomas says, to order the qual
>> clauses at an earlier stage and then remember our decision.  However,
>> we have to think about whether that's going to increase planning time
>> in a noticeable way.  I wonder why we currently postpone this until
>> such a late phase of planning.
>
> Because (1) up to now there's been no need to consider the qual ordering
> till later, and (2) re-doing that sort for each path seemed unduly
> expensive.  If we're to try to estimate whether later quals will be
> reached, then sure the ordering becomes important.  I'm still concerned
> about (2) though.  If there were a way to pre-sort the quals once for
> all paths, the problem goes away, but I don't think that works ---
> indexscans may segregate some quals, and in join cases different paths
> will actually have different sets of quals they need to check depending
> on the join order.
>

Looking at order_qual_clauses(), we can say that a set of quals q1
 qn are ordered the same irrespective of the set of clauses they
are subset of. E.g. if {q1 .. qn} is subset of Q (ordered as Qo) and
also Q' (ordered as Q'o) the order in which they appear in Qo and Q'o
is same. So, even if different paths segregate the clauses in
different set, within each set the order is same. FWIW, we can order
all clauses in largest set once and use that order every time. Albeit
we will have to remember the order somewhere OR make the separator
routine retain the order in the larger set, which I guess is true
about all separator functions.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Overestimated filter cost and its mitigation

2017-11-07 Thread Tom Lane

Robert Haas  writes:
> I think it would be a good idea, as Thomas says, to order the qual
> clauses at an earlier stage and then remember our decision.  However,
> we have to think about whether that's going to increase planning time
> in a noticeable way.  I wonder why we currently postpone this until
> such a late phase of planning.

Because (1) up to now there's been no need to consider the qual ordering
till later, and (2) re-doing that sort for each path seemed unduly
expensive.  If we're to try to estimate whether later quals will be
reached, then sure the ordering becomes important.  I'm still concerned
about (2) though.  If there were a way to pre-sort the quals once for
all paths, the problem goes away, but I don't think that works ---
indexscans may segregate some quals, and in join cases different paths
will actually have different sets of quals they need to check depending
on the join order.

So the bottom line here is how much is it going to cost us to add this
additional refinement in cost estimation, and is it worth it given our
extremely poor level of accuracy in expression cost estimation.  Color
me dubious.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Overestimated filter cost and its mitigation

2017-11-07 Thread Robert Haas

On Mon, Nov 6, 2017 at 5:19 AM, Ashutosh Bapat
 wrote:
> IIRC, only thing that changes between plan time quals and execution
> time quals is constaint folding of constant parameters. But I don't
> think we change the selectivity estimates when that's done. At the
> same time, I don't think we should make a lot of effort to make sure
> that the order used during the estimation is same as the order at the
> execution; we are anyway estimating. There can always be some
> difference between what's estimated and what actually happens.

I don't think that's a good justification for allowing the order to
vary.  It's true that, at execution time, we may find row counts and
selectivities to be different than what we predicted, but that is a
case of the real data being different from our idealized data -- which
is difficult to avoid in general.  Allowing our actual decisions to be
different from the decisions we thought we would make seems like
programmer sloppiness.  It would also be very confusing if the planner
uses one ordering to estimate the cost and then a different order at
execution time and in the EXPLAIN ANALYZE output.  How could anybody
understand what had happened in such a case?

I think it would be a good idea, as Thomas says, to order the qual
clauses at an earlier stage and then remember our decision.  However,
we have to think about whether that's going to increase planning time
in a noticeable way.  I wonder why we currently postpone this until
such a late phase of planning.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Overestimated filter cost and its mitigation

2017-11-06 Thread Ashutosh Bapat

On Mon, Nov 6, 2017 at 10:01 AM, Thomas Munro
 wrote:
>
> This idea seems to makes intuitive sense.  I see that you use
> order_qual_clauses() to know what order they'll run in, so I'm
> wondering if there is any reason we shouldn't do it up front and keep
> it during path building, instead of running it again at plan creation
> time.  Is there some way it could ever produce a different result at
> the two times?

IIRC, only thing that changes between plan time quals and execution
time quals is constaint folding of constant parameters. But I don't
think we change the selectivity estimates when that's done. At the
same time, I don't think we should make a lot of effort to make sure
that the order used during the estimation is same as the order at the
execution; we are anyway estimating. There can always be some
difference between what's estimated and what actually happens.

> Why not also apply this logic to qpquals of joins,
> foreign scans, subplans?  That is, instead of replacing cost_qual_eval
> with this code for baserels, I wonder if we should teach
> cost_qual_eval how to do this so those other users could also benefit
> (having perhaps already ordered the qual clauses earlier).

+1.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Overestimated filter cost and its mitigation

2017-11-05 Thread Thomas Munro

On Mon, Sep 11, 2017 at 7:43 PM, Yuto Hayamizu  wrote:
> Suppose that there are three qual clauses in a scan node, current
> postgres estimates per-tuple cost of the filter to be:
>cost(A) + cost(B) + cost(C)
>
> And basic idea of the attached patch is:
>cost(A) + clauselist_selectivity({A}) * cost(B) +
> clauselist_selectivity({A, B}) * cost(C)

I am no planner expert and I haven't tested or studied the patch in
detail, but here is some feedback for what it's worth.

This idea seems to makes intuitive sense.  I see that you use
order_qual_clauses() to know what order they'll run in, so I'm
wondering if there is any reason we shouldn't do it up front and keep
it during path building, instead of running it again at plan creation
time.  Is there some way it could ever produce a different result at
the two times?  Why not also apply this logic to qpquals of joins,
foreign scans, subplans?  That is, instead of replacing cost_qual_eval
with this code for baserels, I wonder if we should teach
cost_qual_eval how to do this so those other users could also benefit
(having perhaps already ordered the qual clauses earlier).

This is one of those patches that risks having an impact on many query
plans.  Yikes.  Of all the regression test queries, only
updatable_views complained though, and that involves leakproof
functions.  I see that those get some kind of special treatment in
order_qual_clauses().

+ ordered_clauses = order_qual_clauses(root, rel->baserestrictinfo);
+ clause_list = ordered_clauses;

Is clause_list necessary?  Can't you just use ordered_clauses for the
outer and inner loops?

+ List *clause_list_for_sel = NULL;

The convention is to use NIL for empty lists (a nod to the Lisp
machine they prototyped this project on).

+ /* Make a temporary clause list for selectivity calcuation */

s/calcuation/calculation/

-- 
Thomas Munro
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Overestimated filter cost and its mitigation

2017-09-17 Thread David Fetter

On Mon, Sep 11, 2017 at 04:43:46PM +0900, Yuto Hayamizu wrote:
> Hi hackers,
> 
> Currently, cost of a filter with multiple clauses is estimated by
> summing up estimated cost of each clause.  As long as a filter
> consists of simple clauses and its cost is fairly small, it works
> fine. However, when there exists some heavy clauses (like SubQuery or
> user-defined functions) and most of tuples are filtered by other
> simple clauses, total cost is likely to be overestimated.  An attached
> patch mitigates this overestimation by using selectivity estimation of
> subsets of clauses in a filter.

I've taken the liberty of adding this to the upcoming commitfest.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [PATCH] Overestimated filter cost and its mitigation

2017-09-11 Thread Yuto Hayamizu

Hi hackers,

Currently, cost of a filter with multiple clauses is estimated by
summing up estimated cost of each clause.  As long as a filter
consists of simple clauses and its cost is fairly small, it works
fine. However, when there exists some heavy clauses (like SubQuery or
user-defined functions) and most of tuples are filtered by other
simple clauses, total cost is likely to be overestimated.  An attached
patch mitigates this overestimation by using selectivity estimation of
subsets of clauses in a filter.

Suppose that there are three qual clauses in a scan node, current
postgres estimates per-tuple cost of the filter to be:
   cost(A) + cost(B) + cost(C)

And basic idea of the attached patch is:
   cost(A) + clauselist_selectivity({A}) * cost(B) +
clauselist_selectivity({A, B}) * cost(C)


We first noticed this overestimation during performance analysis of
our real applications. In order to reproduce the situation, we created
a similar query with pgbench data.

The database was prepared by following commands:
pgbench --scale=1000 -i pgb5
pgbench  -c 10 -t 10 pgb5

and the query is:
select e.tid,
e.aid,
e.bid,
e.mtime
   from pgbench_history e
  where e.tid between 1 and 1000
and (select count(*)
   from pgbench_history f
  where f.mtime < e.mtime
and f.bid = e.bid
  group by f.bid) > 100
and e.aid between 1 and 10;


== Query plan with current upstream ==

1 Seq Scan on pgbench_history e  (cost=0.00..21393523889.00 rows=28
width=20) (actual time=2391.683..21775.191 rows=85 loops=1)
2  Filter: ((tid >= 1) AND (tid <= 1000) AND (aid >= 1) AND (aid <= 10)
AND ((SubPlan 1) > 100))
3  Rows Removed by Filter: 15
4SubPlan 1
5 ->  GroupAggregate  (cost=0.00..21393.50 rows=283 width=12) (actual
time=229.050..229.050 rows=1 loops=94)
6  Group Key: f.bid
7   ->  Seq Scan on pgbench_history f  (cost=0.00..21389.00
rows=333 width=4) (actual time=5.036..228.851 rows=529 loops=94)
8 Filter: ((mtime < e.mtime) AND (bid = e.bid))
9 Rows Removed by Filter: 999471

Most amount of total cost 21,393,523,889 comes from:
  (cost of SubPlan 1) * (# of rows in pgbench_history) = 21,393.50 *
1,000,000 = 21,393,000,000

but in actual run, SubPlan 1 was executed only 94 times, and total
cost was overestimated more than 1 times greater.


== Query plan with patched upstream ==

1 Seq Scan on pgbench_history e  (cost=0.00..1687169.88 rows=28 width=20)
(actual time=2388.802..21721.498 rows=85 loops=1)
2   Filter: ((tid >= 1) AND (tid <= 1000) AND (aid >= 1) AND (aid <=
10) AND ((SubPlan 1) > 100))
3   Rows Removed by Filter: 15
4   SubPlan 1
5 ->  GroupAggregate  (cost=0.00..19726.83 rows=283 width=12) (actual
time=228.507..228.507 rows=1 loops=94)
6   Group Key: f.bid
7   ->  Seq Scan on pgbench_history f  (cost=0.00..19722.33
rows=333 width=4) (actual time=5.378..228.316 rows=529 loops=94)
8 Filter: ((mtime < e.mtime) AND (bid = e.bid))
9 Rows Removed by Filter: 999471

In patched version of upstream, total cost is largely decreased.  In
cost estimation, only 84.4 tuples of pgbench_history are expected to
be evaluated with SubPlan 1.  It is close to actual number 94 to some
extent, and total cost seems much more reasonable than cost with
current upstream.

Any thoughts?

Regards,
Yuto Hayamizu


Mitigate-filter-cost-overestimation.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Overestimated filter cost and its mitigation

Re: [HACKERS] [PATCH] Overestimated filter cost and its mitigation

Re: [HACKERS] [PATCH] Overestimated filter cost and its mitigation

Re: [HACKERS] [PATCH] Overestimated filter cost and its mitigation

Re: [HACKERS] [PATCH] Overestimated filter cost and its mitigation

Re: [HACKERS] [PATCH] Overestimated filter cost and its mitigation

[HACKERS] [PATCH] Overestimated filter cost and its mitigation

7 matches

Site Navigation

Mail list logo

Footer information