Hi,

While working on a slow query of one of our customers, I found this
behaviour which seems quite weird to me. Here it is:

(no changes on any parameter)
EXPLAIN (ANALYZE) SELECT DISTINCT * FROM gleu2;

                                                       QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
 Unique  (cost=12005.97..12049.20 rows=1 width=1430) (actual
time=20055.294..20323.348 rows=1 loops=1)
   ->  Sort  (cost=12005.97..12006.30 rows=132 width=1430) (actual
time=20055.290..20105.738 rows=60000 loops=1)
         Sort Key: (... 130 columns ...)
         Sort Method: external sort  Disk: 84464kB
         ->  Seq Scan on gleu2  (cost=0.00..12001.32 rows=132 width=1430)
(actual time=0.109..114.142 rows=60000 loops=1)
 Planning time: 10.012 ms
 Execution time: 20337.854 ms
(7 rows)

That looks quite good. The total cost is 12049, so I expect this plan to
have the smaller cost as it's the choosen plan. Now, I'm disabling Sort,
and here is what I get:

SET enable_sort TO off;
EXPLAIN (ANALYZE) SELECT DISTINCT * FROM gleu2;

                                                       QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=12044.22..12044.23 rows=1 width=1430) (actual
time=508.342..508.343 rows=1 loops=1)
   Group Key: (... 130 columns ...)
   ->  Seq Scan on gleu2  (cost=0.00..12001.32 rows=132 width=1430) (actual
time=0.036..57.088 rows=60000 loops=1)
 Planning time: 1.335 ms
 Execution time: 508.529 ms
(5 rows)

I now have a Hash Aggregate. Sounds good, but I do have a smaller total
cost than the former plan. How is it possible that the planner chooses a
plan with a higher cost? (and a much much higher duration, but it can't
obviously know that).

Still working on my customer's issue, I found that it worked great before
commit 3fc6e2d7f5b652b417fa6937c34de2438d60fa9f (Make the upper part of the
planner work by generating and comparing Paths.) and not after. That kind
of patch is way out of my league to be honest :) But before that commit,
the Hash Aggregate is choosen right out of the box. And after that commit,
the Sort is choosen even with its higher cost.

Oh, BTW, yeah, I know. The row estimate is quite off... I'm not asking how
to make my query go faster, I'm just wondering why PostgreSQL chooses a
higher-cost plan :)

Thanks.


-- 
Guillaume.

Reply via email to