Re: [HACKERS] Choosing the cheapest optimizer cost

2016-06-21 Thread Bruce Momjian
On Tue, Jun 21, 2016 at 11:17:19AM -0400, Robert Haas wrote:
> If the index scans are parameterized by values from the seq scan,
> which is likely the situation in which this sort of plan will be
> generated, we'll pay the extra cost of building the hash table once
> per row in something_big.
> 
> I think we should consider switching from a nested loop to a hash join
> on the fly if the outer relation turns out to be bigger than expected.
> We could work out during planning what the expected breakeven point
> is; if the actual outer row count passes that, switch to a hash join.
> This has been discussed before, but nobody's tried to do the work,
> AFAIK.

Yes, the idea of either adjusting the execution plan when counts are
inaccurate, or feeding information about misestimation back to the
optimizer for future queries is something I hope we try someday.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Choosing the cheapest optimizer cost

2016-06-21 Thread Robert Haas
On Wed, Jun 15, 2016 at 3:46 PM, Bruce Momjian  wrote:
> Right now, the optimizer chooses the path with the cheapest cost.
>
> However, do we take into account the behavior of the plan in handling
> mis-estimated row counts?

No.

> For example, if a path has a log(n) behavior
> for changes in the row count, and another plan that is slightly cheaper
> has a log(n^2) behavior, should we choose the former, knowing the the
> row counts are often inaccurate?

Maybe.  It's not really that simple, though.  In practice, the
decision we have to make is something like: should we use a nested
loop here, which will be better if either the inner or outer relation
has < 2 rows, or should we use a hash join, which will be better if
both sides are big?  The nested loop can be really, really bad if it
turns out that the inner side which we expected to have 1 row actually
has 1,000,000 rows, but the hash join can lose, too.  Consider:

Nested Loop Left Join
-> Seq Scan on something_bug
-> Hash Join
  -> Index Scan on at_most_one_row_expected
  -> Hash
-> Index Scan on not_very_many_rows

If the index scans are parameterized by values from the seq scan,
which is likely the situation in which this sort of plan will be
generated, we'll pay the extra cost of building the hash table once
per row in something_big.

I think we should consider switching from a nested loop to a hash join
on the fly if the outer relation turns out to be bigger than expected.
We could work out during planning what the expected breakeven point
is; if the actual outer row count passes that, switch to a hash join.
This has been discussed before, but nobody's tried to do the work,
AFAIK.

> I suppose one approach would be to track not only the path costs, but
> the handling of mis-estimated, and account for that in the final path
> choice?  Do we already do this by giving less stable plans higher costs?
> Does that have the same effect?

The problem with tracking more things during planning is that it
increases the cost of planning.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Choosing the cheapest optimizer cost

2016-06-15 Thread Bruce Momjian
Right now, the optimizer chooses the path with the cheapest cost.

However, do we take into account the behavior of the plan in handling
mis-estimated row counts?  For example, if a path has a log(n) behavior
for changes in the row count, and another plan that is slightly cheaper
has a log(n^2) behavior, should we choose the former, knowing the the
row counts are often inaccurate?

I suppose one approach would be to track not only the path costs, but
the handling of mis-estimated, and account for that in the final path
choice?  Do we already do this by giving less stable plans higher costs?
Does that have the same effect?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers