Column | Type | Modifiers ------------+--------------+----------- the_id | integer | not null the_date | date | not null num1 | numeric(9,4) | num2 | numeric(9,4) | num3 | numeric(9,4) | num4 | numeric(9,4) | int1 | integer | Indexes: "the_table_pkey" primary key, btree (the_id, the_date)
---------------------------------------
The query I want to run is
select stock_id, min(price_date) from day_ends group by stock_id;
---------------------------------------
Here's the plan that I get.
GroupAggregate (cost=3711244.30..3838308.31 rows=6732 width=8) -> Sort (cost=3711244.30..3753593.36 rows=16939624 width=8) Sort Key: stock_id -> Seq Scan on day_ends (cost=0.00..361892.24 rows=16939624 width=8)
If I set enable_seqscan = false, the plan changes to
GroupAggregate (cost=0.00..67716299.91 rows=6732 width=8) -> Index Scan using day_ends_pkey on day_ends (cost=0.00..67631584.96 rows=16939624 width=8)
---------------------------------------
Now... the first plan uses up tons of temporary space for sorting. The second one just runs and runs and runs. I've tried setting the statistics to 1000 with little effect.
So the query can get everything it needs from the index, and a full scan of the index should be faster (the index file is less than half the size of the data file.) So why does the optimizer estimate so high?
Also, to get the MIN for a given group, not all values of the index need to be seen. Must pgsql do a full scan because it treats all aggregates in the same way? Are MIN and MAX used often enough to justify special treatment, and could that be cleanly implemented? Perhaps the aggregate function can request the data in a certain order, be told that it is being passed data in a certain order, and return before seeing the entire set of data.
Food for thought...
Thanks,
Ken Geis
---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])