Bruno Wolff III wrote:
I haven't come up with any great ideas for this one. It might be interesting
to compare the explain analyze output from the distinct on query with
and without seqscans enabled.

Can't do that comparison. Remember, with seqscan it fails. (Oh, and that nested loops solution I thought was fast actually took 31 minutes versus 29 for index scan in 7.4b2.)

I ran another query across the same data:

select price_date, count(*) from day_ends group by price_date;

It used a table scan and hashed aggregates, and it ran in 5.5 minutes. Considering that, pgsql should be able to do the query that I had been running in a little more time than that. So...

From what I've learned, we want to convince the optimizer to use a table scan; that's a good thing. I want it to use hashed aggregates, but I can't convince it to (unless maybe I removed all of the statistics.) To use group aggregates, it first sorts the results of the table scan (all 17 million rows!) There ought to be some way to tell pgsql not to do sorts above a certain size. In this case, if I set enable_sort=false, it goes back to the index scan. If I then set enable_indexscan=false, it goes back to sorting.

---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match

Reply via email to