Re: [PERFORM] Column correlation drifts, index ignored again

2004-06-06 Thread Tom Lane
John Siracusa [EMAIL PROTECTED] writes: I want to do something that will convince Postgres that using the date index is, by far, the best plan when running my queries, even when the date column correlation stat drops well below 1.0. Have you tried experimenting with random_page_cost?

Re: [PERFORM] Column correlation drifts, index ignored again

2004-02-24 Thread Josh Berkus
Kevin, 1. set enable_seqscan = on 2. set random_page_cost = some really high value to force seqscans 3. EXPLAIN ANALYZE query 4. record the ratio of estimated to actual scan times. 5. set enable_seqscan = off 6. set random_page_cost = rough estimate of what it should be 7. EXPLAIN

Re: [PERFORM] Column correlation drifts, index ignored again

2004-02-24 Thread Tom Lane
Josh Berkus [EMAIL PROTECTED] writes: Kevin, 1. set enable_seqscan = on 2. set random_page_cost = some really high value to force seqscans 3. EXPLAIN ANALYZE query 4. record the ratio of estimated to actual scan times. 5. set enable_seqscan = off 6. set random_page_cost = rough

Re: [PERFORM] Column correlation drifts, index ignored again

2004-02-24 Thread Kevin Brown
Tom Lane wrote: Josh Berkus [EMAIL PROTECTED] writes: Kevin, 1. set enable_seqscan = on 2. set random_page_cost = some really high value to force seqscans 3. EXPLAIN ANALYZE query 4. record the ratio of estimated to actual scan times. 5. set enable_seqscan = off 6. set

Re: [PERFORM] Column correlation drifts, index ignored again

2004-02-24 Thread Ed L.
On Tuesday February 24 2004 1:14, Kevin Brown wrote: One problem I've been running into is the merge join spilling to disk because sort_mem isn't big enough. The problem isn't that this is happening, it's that I think the planner is underestimating the impact that doing this will have on the

Re: [PERFORM] Column correlation drifts, index ignored again

2004-02-24 Thread Kevin Brown
Ed L. wrote: How do you know the merge join is spilling to disk? How are you identifying that? Just assuming from vmstat? iostat? The existence of files in $PG_DATA/base/db-oid/pgsql_tmp while the query is running, combined with the EXPLAIN output (which shows what sorts and joins are being

Re: [PERFORM] Column correlation drifts, index ignored again

2004-02-24 Thread Tom Lane
Kevin Brown [EMAIL PROTECTED] writes: One problem I've been running into is the merge join spilling to disk because sort_mem isn't big enough. The problem isn't that this is happening, it's that I think the planner is underestimating the impact that doing this will have on the time the merge

Re: [PERFORM] Column correlation drifts, index ignored again

2004-02-23 Thread Kevin Brown
Josh Berkus wrote: John, I think the key is to get the planner to correctly ballpark the number of rows in the date range. If it does, I can't imagine it ever deciding to read 1,000,000 rows instead of 1,000 with any sane cost setting. I'm assuming the defaults are sane :) The

Re: [PERFORM] Column correlation drifts, index ignored again

2004-02-22 Thread Josh Berkus
On Saturday 21 February 2004 16:18, John Siracusa wrote: John, Next, thanks to my earlier thread, I clustered the table on the date column and then SET STATISTICS on the date column to be 100. That did the trick, and I stopped explicitly disabling seqscan. 100? Are you sure you don't mean

Re: [PERFORM] Column correlation drifts, index ignored again

2004-02-22 Thread John Siracusa
On 2/22/04 5:06 PM, Tom Lane wrote: John Siracusa [EMAIL PROTECTED] writes: I want to do something that will convince Postgres that using the date index is, by far, the best plan when running my queries, even when the date column correlation stat drops well below 1.0. Have you tried

Re: [PERFORM] Column correlation drifts, index ignored again

2004-02-22 Thread John Siracusa
On 2/22/04 6:40 PM, Tom Lane wrote: John Siracusa [EMAIL PROTECTED] writes: I think the key is to get the planner to correctly ballpark the number of rows in the date range. I thought it was. What you showed was - Index Scan using mytable_date_idx on mytable (cost=0.00..3071.70

Re: [PERFORM] Column correlation drifts, index ignored again

2004-02-22 Thread Josh Berkus
John, I think the key is to get the planner to correctly ballpark the number of rows in the date range. If it does, I can't imagine it ever deciding to read 1,000,000 rows instead of 1,000 with any sane cost setting. I'm assuming the defaults are sane :) The default for random_page_cost is

Re: [PERFORM] Column correlation drifts, index ignored again

2004-02-22 Thread Tom Lane
John Siracusa [EMAIL PROTECTED] writes: Does the planner get estimates from both plans before deciding whether or not to use the one that references the date index? The rowcount estimate is made prior to the plan cost estimate, much less the plan selection. So you'd see the same number either