Re: [PERFORM] index v. seqscan for certain values
On Tue, 13 Apr 2004 13:55:49 -0400, Tom Lane [EMAIL PROTECTED] wrote: Possibly the nonuniform clumping of CID has something to do with the poor results. It shouldn't. The sampling algorithm is designed to give each tuple the same chance of ending up in the sample, and tuples are selected independently. (IOW each one of the {N \chooose n} possible samples has the same probability.) There are known problems with nonuniform distribution of dead vs. live and large vs. small tuples, but AFAICS the order of values does not matter. Servus Manfred ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] index v. seqscan for certain values
When I just tried it again with a value of 300, analyze, then run the query, I get a *worse* result for an estimate. I don't understand this. That's annoying. How repeatable are these results --- if you do ANALYZE over again several times, how much does the row count estimate change each time? (It should change somewhat, since ANALYZE is taking a random sample, but one would like to think not a whole lot.) Is the variance more or less at the higher stats target? Take a look at a few different CID values to get a sense of the accuracy, don't look at just one ... Yes, it's repeatable. I tried a bunch of times, and there are only small variations in the stats for the higher stat targets. (Actually, you might find it more profitable to look at the pg_stats entry for the CID column rather than reverse-engineering the stats via ANALYZE. Look at how well the most-common-values list and associated frequency numbers track reality.) I checked the accuracy of the stats for various values, and there is a wide variation. I see some values where the estimate is 1.75x the actual; and others where the estimate is .44x the actual. Also, can you think of any reason for the distribution of CID values to be nonuniform within the table? For instance, do rows get inserted in order of increasing CID, or is there any clustering of rows with the same CID? This is almost certainly the answer. The data is initially inserted in chunks for each CID, and later on there is a more normal distribution of insert/update/deletes across all CIDs; and then again a new CID will come with a large chunk of rows, etc. Interestingly, I tried increasing the stat size for the CID column to 2000, analyzing, and checking the accuracy of the stats again. Even with this relatively high value, the accuracy of the stats is not that close. The value giving .44x previously nows gives an estimate .77x of actual. Another value which was at 1.38x of actual is now at .71x of actual! Then just for kicks I set the statistics size to 100,000 (!), analyzed, and ran the query again. For the same CID I still got an estimated row count that is .71x the actual rows returned. Why is this not better? I wonder how high I'd have to set the statistics collector to get really good data, given the uneven data distribution of this table. Is there any other technique that works better to get good estimates, given uneven distribution of values? So I think this explains the inaccurate stats; and the solution as far as I'm concerned is to increase the two params mentioned yesterday (effective_cache_size random_page_cost). Thanks again for the help! - Jeremy ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] index v. seqscan for certain values
Jeremy Dunn [EMAIL PROTECTED] writes: Interestingly, I tried increasing the stat size for the CID column to 2000, analyzing, and checking the accuracy of the stats again. There's a hard limit of 1000, I believe. Didn't it give you a warning saying so? At 1000 the ANALYZE sample size would be 30 rows, or about a quarter of your table. I would have thought this would give frequency estimates with much better precision than you seem to be seeing --- but my statistics are rusty enough that I'm not sure about it. Possibly the nonuniform clumping of CID has something to do with the poor results. Any stats majors on the list? regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] index v. seqscan for certain values
There's a hard limit of 1000, I believe. Didn't it give you a warning saying so? No warning at 2000, and no warning at 100,000 either! Remember we are still on 7.2.x. The docs here http://www.postgresql.org/docs/7.2/static/sql-altertable.html don't say anything about a limit. This is good to know, if it's true. Can anyone confirm? - Jeremy ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] index v. seqscan for certain values
Quick bit of input, since you didn't mention it. How often do you run ANALYZE? I found it interesting that a database I was doing tests on sped up by a factor of 20 after ANALYZE. If your data changes a lot, you should probably schedule ANALYZE to run with VACUUM. Jeremy Dunn wrote: I've searched the archives and can't find an answer to this seemingly simple question. Apologies if it's too common. The table in question has ~1.3M rows. It has 85 columns, 5 of which have single-column indexes. The column in question (CID) has 183 distinct values. For these values, the largest has ~38,000 rows, and the smallest has 1 row. About 30 values have 100 rows, and about 10 values have 20,000 rows. The database is 7.2.3 running on RedHat 7.1. (we are in process of upgrading to PG 7.4.2)All of the query plan options are enabled, and the cpu costs are set to the default values. ( cpu_tuple_cost is 0.01, cpu_index_tuple_cost is 0.001). The database is VACUUM'd every night. The problem: A simply query: select count(*) from xxx where CID=smalval where smalval is a CID value which has relatively few rows, returns a plan using the index on that column. explain analyze select count(*) from xxx where cid=869366; Aggregate (cost=19136.33..19136.33 rows=1 width=0) (actual time=78.49..78.49 rows=1 loops=1) - Index Scan using xxx_cid on emailrcpts (cost=0.00..19122.21 rows=5648 width=0) (actual time=63.40..78.46 rows=1 loops=1) Total runtime: 78.69 msec The same plan is true for values which have up to about 20,000 rows: explain analyze select count(*) from xxx where cid=6223341; Aggregate (cost=74384.19..74384.19 rows=1 width=0) (actual time=11614.89..11614.89 rows=1 loops=1) - Index Scan using xxx_cid on emailrcpts (cost=0.00..74329.26 rows=21974 width=0) (actual time=35.75..11582.10 rows=20114 loops=1) Total runtime: 11615.05 msec However for the values that have 20,000 rows, the plan changes to a sequential scan, which is proportionately much slower. explain analyze select count(*) from xxx where cid=7191032; Aggregate (cost=97357.61..97357.61 rows=1 width=0) (actual time=46427.81..46427.82 rows=1 loops=1) - Seq Scan on xxx (cost=0.00..97230.62 rows=50792 width=0) (actual time=9104.45..46370.27 rows=37765 loops=1) Total runtime: 46428.00 msec The question: why does the planner consider a sequential scan to be better for these top 10 values? In terms of elapsed time it is more than twice as slow, proportionate to an index scan for the same number of rows. What I tried: A) alter table xxx alter column cid set statistics 500; analyze xxx; This does not affect the results. B) dropped/rebuilt the index, with no improvement. C) decreasing cpu_index_tuple_cost by a factor of up to 1000, with no success D) force an index scan for the larger values by using a very high value for cpu_tuple_cost (e.g. .5) but this doesn't seem like a wise thing to do. Your thoughts appreciated in advance! - Jeremy 7+ years experience in Oracle performance-tuning relatively new to postgresql -- Bill Moran Potential Technologies http://www.potentialtech.com ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] index v. seqscan for certain values
Sorry I should have written that we do VACUUM VERBOSE ANALYZE every night. - Jeremy -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Bill Moran Sent: Monday, April 12, 2004 12:09 PM To: [EMAIL PROTECTED] Cc: Postgresql Performance Subject: Re: [PERFORM] index v. seqscan for certain values Quick bit of input, since you didn't mention it. How often do you run ANALYZE? I found it interesting that a database I was doing tests on sped up by a factor of 20 after ANALYZE. If your data changes a lot, you should probably schedule ANALYZE to run with VACUUM. Jeremy Dunn wrote: I've searched the archives and can't find an answer to this seemingly simple question. Apologies if it's too common. The table in question has ~1.3M rows. It has 85 columns, 5 of which have single-column indexes. The column in question (CID) has 183 distinct values. For these values, the largest has ~38,000 rows, and the smallest has 1 row. About 30 values have 100 rows, and about 10 values have 20,000 rows. The database is 7.2.3 running on RedHat 7.1. (we are in process of upgrading to PG 7.4.2)All of the query plan options are enabled, and the cpu costs are set to the default values. ( cpu_tuple_cost is 0.01, cpu_index_tuple_cost is 0.001). The database is VACUUM'd every night. The problem: A simply query: select count(*) from xxx where CID=smalval where smalval is a CID value which has relatively few rows, returns a plan using the index on that column. explain analyze select count(*) from xxx where cid=869366; Aggregate (cost=19136.33..19136.33 rows=1 width=0) (actual time=78.49..78.49 rows=1 loops=1) - Index Scan using xxx_cid on emailrcpts (cost=0.00..19122.21 rows=5648 width=0) (actual time=63.40..78.46 rows=1 loops=1) Total runtime: 78.69 msec The same plan is true for values which have up to about 20,000 rows: explain analyze select count(*) from xxx where cid=6223341; Aggregate (cost=74384.19..74384.19 rows=1 width=0) (actual time=11614.89..11614.89 rows=1 loops=1) - Index Scan using xxx_cid on emailrcpts (cost=0.00..74329.26 rows=21974 width=0) (actual time=35.75..11582.10 rows=20114 loops=1) Total runtime: 11615.05 msec However for the values that have 20,000 rows, the plan changes to a sequential scan, which is proportionately much slower. explain analyze select count(*) from xxx where cid=7191032; Aggregate (cost=97357.61..97357.61 rows=1 width=0) (actual time=46427.81..46427.82 rows=1 loops=1) - Seq Scan on xxx (cost=0.00..97230.62 rows=50792 width=0) (actual time=9104.45..46370.27 rows=37765 loops=1) Total runtime: 46428.00 msec The question: why does the planner consider a sequential scan to be better for these top 10 values? In terms of elapsed time it is more than twice as slow, proportionate to an index scan for the same number of rows. What I tried: A) alter table xxx alter column cid set statistics 500; analyze xxx; This does not affect the results. B) dropped/rebuilt the index, with no improvement. C) decreasing cpu_index_tuple_cost by a factor of up to 1000, with no success D) force an index scan for the larger values by using a very high value for cpu_tuple_cost (e.g. .5) but this doesn't seem like a wise thing to do. Your thoughts appreciated in advance! - Jeremy 7+ years experience in Oracle performance-tuning relatively new to postgresql -- Bill Moran Potential Technologies http://www.potentialtech.com ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] index v. seqscan for certain values
On Mon, 12 Apr 2004, Jeremy Dunn wrote: explain analyze select count(*) from xxx where cid=6223341; Aggregate (cost=74384.19..74384.19 rows=1 width=0) (actual time=11614.89..11614.89 rows=1 loops=1) - Index Scan using xxx_cid on emailrcpts (cost=0.00..74329.26 rows=21974 width=0) (actual time=35.75..11582.10 rows=20114 loops=1) Total runtime: 11615.05 msec However for the values that have 20,000 rows, the plan changes to a sequential scan, which is proportionately much slower. explain analyze select count(*) from xxx where cid=7191032; Aggregate (cost=97357.61..97357.61 rows=1 width=0) (actual time=46427.81..46427.82 rows=1 loops=1) - Seq Scan on xxx (cost=0.00..97230.62 rows=50792 width=0) (actual time=9104.45..46370.27 rows=37765 loops=1) Total runtime: 46428.00 msec The question: why does the planner consider a sequential scan to be better for these top 10 values? In terms of elapsed time it is more than twice as slow, proportionate to an index scan for the same number of rows. One thing to do is to set enable_seqscan=off and run the above and compare the estimated and real costs. It may be possible to lower random_page_cost to a still reasonable number in order to move the point of the switchover to seqscan. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] index v. seqscan for certain values
Jeremy Dunn [EMAIL PROTECTED] writes: The question: why does the planner consider a sequential scan to be better for these top 10 values? At some point a seqscan *will* be better. In the limit, if the key being sought is common enough to occur on every page of the table, it's certain that a seqscan will require less I/O than an indexscan (because reading the index isn't actually saving you any heap fetches). In practice the breakeven point is less than that because Unix kernels are better at handling sequential than random access. Your gripe appears to be basically that the planner's idea of the breakeven point is off a bit. It looks to me like it's within about a factor of 2 of being right, though, which is not all that bad when it's using generic cost parameters. A) alter table xxx alter column cid set statistics 500; analyze xxx; This does not affect the results. It probably improved the accuracy of the row count estimates, no? The estimate you show for cid=7191032 is off by more than 25% (37765 vs 50792), which seems like a lot of error for one of the most common values in the table. (I hope that was with default stats target and not 500.) That leads directly to a 25% overestimate of the cost of an indexscan, while having IIRC no impact on the cost of a seqscan. Since the cost ratio was more than 25%, this didn't change the selected plan, but you want to fix that error as best you can before you move on to tweaking cost parameters. C) decreasing cpu_index_tuple_cost by a factor of up to 1000, with no success Wrong thing. You should be tweaking random_page_cost. Looks to me like a value near 2 might be appropriate for your setup. Also it is likely appropriate to increase effective_cache_size, which is awfully small in the default configuration. I'd set that to something related to your available RAM before trying to home in on a suitable random_page_cost. AFAIK hardly anyone bothers with changing the cpu_xxx costs ... regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] index v. seqscan for certain values
Jeremy Dunn [EMAIL PROTECTED] writes: The question: why does the planner consider a sequential scan to be better for these top 10 values? At some point a seqscan *will* be better. In the limit, if the key being sought is common enough to occur on every page of the table, it's certain that a seqscan will require less I/O than an indexscan (because reading the index isn't actually saving you any heap fetches). In practice the breakeven point is less than that because Unix kernels are better at handling sequential than random access. Your gripe appears to be basically that the planner's idea of the breakeven point is off a bit. It looks to me like it's within about a factor of 2 of being right, though, which is not all that bad when it's using generic cost parameters. Agreed. However, given that count(*) is a question that can be answered _solely_ using the index (without reference to the actual data blocks), I'd expect that the break-even point would be considerably higher than the 3% (~38,000 / ~1.3M) I'm currently getting. Does PG not use solely the index in this situation?? A) alter table xxx alter column cid set statistics 500; analyze xxx; This does not affect the results. It probably improved the accuracy of the row count estimates, no? The estimate you show for cid=7191032 is off by more than 25% (37765 vs 50792), which seems like a lot of error for one of the most common values in the table. (I hope that was with default stats target and not 500.) That leads directly to a 25% overestimate of the cost of an indexscan, while having IIRC no impact on the cost of a seqscan. Since the cost ratio was more than 25%, this didn't change the selected plan, but you want to fix that error as best you can before you move on to tweaking cost parameters. Actually it made them worse! Yes, this was the default statistics (10). When I just tried it again with a value of 300, analyze, then run the query, I get a *worse* result for an estimate. I don't understand this. alter table xxx alter column cid set statistics 300; analyze emailrcpts; set random_page_cost to 2; explain analyze select count(*) from xxx where cid=7191032; Aggregate (cost=20563.28..20563.28 rows=1 width=0) (actual time=7653.90..7653.90 rows=1 loops=1) - Index Scan using xxx_cid on xxx (cost=0.00..20535.82 rows=10983 width=0) (actual time=72.24..7602.38 rows=37765 loops=1) Total runtime: 7654.14 msec Now it estimates I have only 10,983 rows (~3x too low) instead of the old estimate 50,792 (1.3x too high). Why is that ?? Anyway, a workable solution seems to be using a lower value for Random_Page_Cost. Thanks to everyone who replied with this answer. Also it is likely appropriate to increase effective_cache_size, which is awfully small in the default configuration. I'd set that to something related to your available RAM before trying to home in on a suitable random_page_cost. We have ours set to the default value of 1000, which does seem low for a system with 1GB of RAM. We'll up this once we figure out what's available. Then tweak the Random_Page_Cost appropriately at that point. I'd still like to understand the strangeness above, if anyone can shed light. - Jeremy ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] index v. seqscan for certain values
On Mon, Apr 12, 2004 at 15:05:02 -0400, Jeremy Dunn [EMAIL PROTECTED] wrote: Agreed. However, given that count(*) is a question that can be answered _solely_ using the index (without reference to the actual data blocks), I'd expect that the break-even point would be considerably higher than the 3% (~38,000 / ~1.3M) I'm currently getting. Does PG not use solely the index in this situation?? That isn't true. In order to check visibility you need to look at the data blocks. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] index v. seqscan for certain values
Jeremy Dunn [EMAIL PROTECTED] writes: Agreed. However, given that count(*) is a question that can be answered _solely_ using the index (without reference to the actual data blocks), As Bruno noted, that is not the case in Postgres; we must visit the table rows anyway. When I just tried it again with a value of 300, analyze, then run the query, I get a *worse* result for an estimate. I don't understand this. That's annoying. How repeatable are these results --- if you do ANALYZE over again several times, how much does the row count estimate change each time? (It should change somewhat, since ANALYZE is taking a random sample, but one would like to think not a whole lot.) Is the variance more or less at the higher stats target? Take a look at a few different CID values to get a sense of the accuracy, don't look at just one ... (Actually, you might find it more profitable to look at the pg_stats entry for the CID column rather than reverse-engineering the stats via ANALYZE. Look at how well the most-common-values list and associated frequency numbers track reality.) Also, can you think of any reason for the distribution of CID values to be nonuniform within the table? For instance, do rows get inserted in order of increasing CID, or is there any clustering of rows with the same CID? regards, tom lane ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match