Hi David ;-)

On 2.3.2015 20:19, David Kubečka wrote:
>
> The question is why optimizer, or rather the cost estimator,
> produced so much different estimates upon very small change in input.
> Moreover it seems that the main culprit of bad estimates isn't
> actually directly related to outer table, but rather to inner table.
> Just compare estimates for the two index scans:
> 
> With 'random_fk_dupl':
>          ->  Index Scan using facts_fk_idx on facts  (cost=0.42..5.75
> rows=100 width=15) (actual time=0.009..0.117 rows=98 loops=100)
> With 'random_fk_uniq':
>          ->  Index Scan using facts_fk_idx on facts  (cost=0.42..214.26
> rows=100 width=15) (actual time=0.007..0.109 rows=98 loops=100)
> 
> I have read the optimizer README file and also looked briefly at the
>  code, but this seems to be something not related to particular 
> implementation of algorithm (e.g. nested loop). Perhaps it's the way 
> how cost estimates are propagated down (or sideways? that would be 
> weird...) the query tree. But I am really not sure, since this is my 
> first time lookng at the optimizer code base. I should also add that 
> I have reproduced this behaviour for all versions of Pg from 9.2 up 
> to current devel.

Interesting. I've only spent a few minutes looking at this, but it
certainly is a bit strange that using smaller table with unique values
results in a slower plan than using a table with duplicities (but the
same number of unique values).

Another observation is that the planner does see other plans, because
tweaking the cost variables like this:

    SET random_page_cost = 2;

produces this plan (which on my machine runs just a tad slower than the
first query in your post):

                            QUERY PLAN
---------------------------------------------------------------------
 HashAggregate  (cost=10564.81..10688.47 rows=9893 width=15)
   Group Key: facts.fk
   ->  Nested Loop  (cost=5.42..10515.34 rows=9893 width=15)
       ->  HashAggregate  (cost=2.24..3.23 rows=99 width=4)
             Group Key: random_fk_uniq.fk
             ->  Seq Scan on random_fk_uniq  (cost=0.00..1.99 ...)
       ->  Bitmap Heap Scan on facts  (cost=3.18..105.18 rows=100 ...)
           Recheck Cond: (fk = random_fk_uniq.fk)
           ->  Bitmap Index Scan on facts_fk_idx  (cost=0.00..3.15 ...)
                     Index Cond: (fk = random_fk_uniq.fk)

I can get similar results by setting cpu_operator_cost=0.005. And
further lowering to random_page_cost to 1.1 actually gets me this:

                            QUERY PLAN
---------------------------------------------------------------------
 HashAggregate  (cost=6185.41..6309.08 rows=9893 width=15)
   Group Key: facts.fk
   ->  Nested Loop  (cost=2.66..6135.95 rows=9893 width=15)
      ->  HashAggregate  (cost=2.24..3.23 rows=99 width=4)
            Group Key: random_fk_uniq.fk
            ->  Seq Scan on random_fk_uniq  (cost=0.00..1.99 ...)
      ->  Index Scan using facts_fk_idx on facts  (cost=0.42..60.95 ...)
               Index Cond: (fk = random_fk_uniq.fk)

which is exactly the first plan from your post.

I still don't understand why using the smaller table produces less
efficient plan, but the estimated costs are very clost (20013 vs. 18147
is very small difference in this context), and the estimator shoots for
minimal average error. So it may seem 'weirdly pessimistic' but it might
be equally optimistic for other queries.

Also, keep in mind that the cost model is just a simplification of
reality, so it can't be correct 100% of the time.

The fact that this small difference in costs results in significant
duration difference suggests that the default optimizer cost values do
not reflect your environment. For example, you assume that this is very
fast:

    NestLoop
        -> Seq Scan on A
        -> Index Scan using B_Y_IDX on B
            Index Condition: B.Y = A.X

but index scans often produce a lot of random I/O, and that may be quite
expensive if it really hits the storage. And that's what the default
values (random_page_cost=4) is set for. But if you're on a system with
lots of RAM, or SSD, ... that simply is not the case, and may penalize
index scans (and prefer more sequential workloads).

The other thing you might do is creating index on (fk,f), which on the
new releases produces this:

                           QUERY PLAN
-------------------------------------------------------------------------
 HashAggregate  (cost=783.77..907.43 rows=9893 width=15)
   Group Key: facts.fk
   ->  Nested Loop  (cost=2.66..734.30 rows=9893 width=15)
       ->  HashAggregate  (cost=2.24..3.23 rows=99 width=4)
           Group Key: random_fk_uniq.fk
           ->  Seq Scan on random_fk_uniq  (cost=0.00..1.99 ...)
       ->  Index Only Scan using facts_2_idx on facts  (cost=...)
               Index Cond: (fk = random_fk_uniq.fk)

Which limits the amount of random I/O by only scanning the index, and is
even faster than the first query.

regard

-- 
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to