Re: [PERFORM] cpu_tuple_cost
On Mon, 14 Mar 2005 21:23:29 -0500, Tom Lane [EMAIL PROTECTED] wrote: I think that the reduce random_page_cost mantra is not an indication that that parameter is wrong, but that the cost models it feeds into need more work. One of these areas is the cost interpolation depending on correlation. This has been discussed on -hackes in October 2002 and August 2003 (Correlation in cost_index()). My Postgres installations contain the patch presented during that discussion (and another index correlation patch), and I use *higher* values for random_page_cost (up to 10). Servus Manfred ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] cpu_tuple_cost
Josh Berkus josh@agliodbs.com writes: Yes it is. I ran experiments back in the late 90s to derive it. Check the archives. H ... which list? -hackers, no doubt. -performance didn't exist then. regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] cpu_tuple_cost
On Thu, Mar 17, 2005 at 09:54:29AM -0800, Josh Berkus wrote: Yes it is. I ran experiments back in the late 90s to derive it. Check the archives. H ... which list? These look like relevant threads: http://archives.postgresql.org/pgsql-hackers/2000-01/msg00910.php http://archives.postgresql.org/pgsql-hackers/2000-02/msg00215.php -- Michael Fuhr http://www.fuhr.org/~mfuhr/ ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] cpu_tuple_cost
The this day and age argument isn't very convincing. Hard drive capacity growth has far outstripped hard drive seek time and bandwidth improvements. Random access has more penalty than ever. In point of fact, there haven't been noticeable seek time improvements for years. Transfer rates, on the other hand, have gone through the roof. Er, yeah. I stated it wrong. The real ratio here is between seek time and throughput. Typical 7200RPM drives have average seek times are in the area of 10ms. Typical sustained transfer rates are in the range of 40Mb/s. Postgres reads 8kB blocks at a time. So 800kB/s for random access reads. And 40Mb/s for sequential reads. That's a factor of 49. I don't think anyone wants random_page_cost to be set to 50 though. For a high end 15k drive I see average seek times get as low as 3ms. And sustained transfer rates get as high as 100Mb/s. So about 2.7Mb/s for random access reads or about a random_page_cost of 37. Still pretty extreme. So what's going on with the empirically derived value of 4? Perhaps this is because even though Postgres is reading an entire table sequentially it's unlikely to be the only I/O consumer? The sequential reads would be interleaved occasionally by some other I/O forcing a seek to continue. What about the cache memory on the disk? Even IDE disks have some 8Mb cache today, which makes a lot of difference for fairly short scans. Even if it's just read cache. That'll bring the speed of random access down to a 1=1 relationship with sequential access, assuming all fits in the cache. //Magnus ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] cpu_tuple_cost
Greg, So 800kB/s for random access reads. And 40Mb/s for sequential reads. That's a factor of 49. I don't think anyone wants random_page_cost to be set to 50 though. For a high end 15k drive I see average seek times get as low as 3ms. And sustained transfer rates get as high as 100Mb/s. So about 2.7Mb/s for random access reads or about a random_page_cost of 37. Still pretty extreme. Actually, what you're demonstrating here is that there's really no point in having a random_page_cost GUC, since the seek/scan ratio is going to be high regardless. Although I can point out that you left out the fact that the disk needs to do a seek to find the beginning of the seq scan area, and even then some file fragmentation is possible. Finally, I've never seen PostgreSQL manage more than 70% of the maximum read rate, and in most cases more like 30%. So what's going on with the empirically derived value of 4? It's not empirically derived; it's a value we plug into an internal-to-postgresql formula. And 4 is a fairly conservative value that works for a lot of systems. Realistically, the values we should be deriving from are: -- median file cache size for postgresql files -- average disk read throughput -- effective processor calculation throughput -- median I/O contention However, working those 4 hardware facts into forumulas that allow us to calculate the actual cost of a query execution plan is somebody's PhD paper. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] cpu_tuple_cost
Josh Berkus josh@agliodbs.com writes: So what's going on with the empirically derived value of 4? It's not empirically derived; Yes it is. I ran experiments back in the late 90s to derive it. Check the archives. Disks have gotten noticeably bigger since then, but I don't think the ratio of seek time to rotation rate has changed much. regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] cpu_tuple_cost
Gregory Stark wrote: The this day and age argument isn't very convincing. Hard drive capacity growth has far outstripped hard drive seek time and bandwidth improvements. Random access has more penalty than ever. In point of fact, there haven't been noticeable seek time improvements for years. Transfer rates, on the other hand, have gone through the roof. Which is why I would question the published tuning advice that recommends lowering it to 2 for arrays. Arrays increase the effective transfer rate more than they reduce random access times. Dropping from 4 to 2 would reflect going from a typical single 7200rpm ATA drive to a 15000rpm SCSI drive, but striping will move it back up again - probably even higher than 4 with a big array (at a guess, perhaps the relationship might be approximated as a square root after allowing for the array type?). With default settings, I've seen the planner pick the wrong index unless random_page_cost was set to 2. But in testing on an ATA drive, I achieved slightly better plan costings by increasing cpu_tuple_cost (relative to cpu_index_tuple_cost - by default it's only a factor of 10) and actually *raising* random_page_cost to 5! So why pick on one parameter? It's all going to vary according to the query and the data. I agree with Tom 100%. Pulling levers on a wonky model is no solution. ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] cpu_tuple_cost
David Brown [EMAIL PROTECTED] writes: Gregory Stark wrote: The this day and age argument isn't very convincing. Hard drive capacity growth has far outstripped hard drive seek time and bandwidth improvements. Random access has more penalty than ever. In point of fact, there haven't been noticeable seek time improvements for years. Transfer rates, on the other hand, have gone through the roof. Er, yeah. I stated it wrong. The real ratio here is between seek time and throughput. Typical 7200RPM drives have average seek times are in the area of 10ms. Typical sustained transfer rates are in the range of 40Mb/s. Postgres reads 8kB blocks at a time. So 800kB/s for random access reads. And 40Mb/s for sequential reads. That's a factor of 49. I don't think anyone wants random_page_cost to be set to 50 though. For a high end 15k drive I see average seek times get as low as 3ms. And sustained transfer rates get as high as 100Mb/s. So about 2.7Mb/s for random access reads or about a random_page_cost of 37. Still pretty extreme. So what's going on with the empirically derived value of 4? Perhaps this is because even though Postgres is reading an entire table sequentially it's unlikely to be the only I/O consumer? The sequential reads would be interleaved occasionally by some other I/O forcing a seek to continue. In which case the true random_page_cost seems like it would be extremely sensitive to the amount of readahead the OS does. To reach a random_page_cost of 4 given the numbers above for a 7200RPM drive requires that just under 25% of the I/O of a sequential table scan be random seeks [*]. That translates to 32kB of sequential reading, which actually does sound like a typical value for OS readahead. I wonder if those same empirical tests would show even higher values of random_page_cost if the readahead were turned up to 64kB or 128kB. [*] A bit of an algebraic diversion: 1s/10ms = 100 random buffers/s. random_page_cost = 4 so net sequential buffers/s = 400. solve: 400 buffers = rnd+seq 1000ms = .2*seq + 10*rnd -- greg ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] cpu_tuple_cost
Greg Sabino Mullane [EMAIL PROTECTED] writes: On that note, can I raise the idea again of dropping the default value for random_page_cost in postgresql.conf? I think 4 is too conservative in this day and age. Certainly the person who will be negatively impacted by a default drop of 4 to 3 will be the exception and not the rule. The ones who'd be negatively impacted are the ones we haven't been hearing from ;-). To assume that they aren't out there is a logical fallacy. I still think that 4 is about right for large databases (where large is in comparison to available RAM). Also, to the extent that we think these numbers mean anything at all, we should try to keep them matching the physical parameters we think they represent. I think that the reduce random_page_cost mantra is not an indication that that parameter is wrong, but that the cost models it feeds into need more work. One thing we *know* is wrong is the costing of nestloop inner indexscans: there needs to be a correction for caching of index blocks across repeated scans. I've looked at this a few times but not come up with anything that seemed convincing. Another thing I've wondered about more than once is if we shouldn't discount fetching of higher-level btree pages on the grounds that they're probably in RAM already, even if the indexscan isn't inside a loop. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] cpu_tuple_cost
Greg Sabino Mullane wrote: [ There is text before PGP section. ] -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Reducing random_page_cost is usually the best way to get the planner to favor indexscans more. On that note, can I raise the idea again of dropping the default value for random_page_cost in postgresql.conf? I think 4 is too conservative in this day and age. Certainly the person who will be negatively impacted by a default drop of 4 to 3 will be the exception and not the rule. Agreed. I think we should reduce it at least to 3. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] cpu_tuple_cost
Bruce Momjian pgman@candle.pha.pa.us wrote: Agreed. I think we should reduce it at least to 3. Note that changing it from 4 to 3 or even 2 is unlikely to really change much. Many of the plans people complain about turn out to have critical points closer to 1.2 or 1.1. The only reason things work out better with such low values is because people have data sets that fit more or less entirely in RAM. So values close to 1 or even equal to 1 actually represent the reality. The this day and age argument isn't very convincing. Hard drive capacity growth has far outstripped hard drive seek time and bandwidth improvements. Random access has more penalty than ever. -- greg ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] cpu_tuple_cost
Greg, On that note, can I raise the idea again of dropping the default value for random_page_cost in postgresql.conf? I think 4 is too conservative in this day and age. Certainly the person who will be negatively impacted by a default drop of 4 to 3 will be the exception and not the rule. I don't agree. The defaults are there for people who aren't going to read enough of the documentation to set them. As such, conservative for the defaults is appropriate. If we were going to change anything automatically, it would be to set effective_cache_size to 1/3 of RAM at initdb time. However, I don't know any method to determine RAM size that works on all the platforms we support. Tom, Also, to the extent that we think these numbers mean anything at all, we should try to keep them matching the physical parameters we think they represent. Personally, what I would love to see is the system determining and caching some of these parameters automatically. For example, in a database which has been running in production for a couple of days, it should be possible to determine the ratio of average random seek tuple cost to average seq scan tuple cost. Other parameters should really work the same way. Effective_cache_size, for example, is a blunt instrument to replace what the database should ideally do through automated interactive fine tuning. Particularly since we have 2 separate caches (or 3, if you count t1 and t2 from 2Q). What the planner really needs to know is: is this table or index already in the t1 or t2 cache (can't we determine this?)? How likely is it to be in the filesystem cache? The latter question is not just one of size (table memory), but one of frequency of access. Of course, this stuff is really, really hard which is why we rely on the GUCs ... -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] cpu_tuple_cost
I have forgotten this : CREATE OR REPLACE FUNCTION date_to_yearmonth_dec(TIMESTAMP) RETURNS VARCHAR AS' BEGIN RETURN extract(year FROM $1) || extract(month FROM $1)-1; END'LANGUAGE plpgsql IMMUTABLE; ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] cpu_tuple_cost
Daniel Schuchardt [EMAIL PROTECTED] writes: i have a query plan who is bad with standard cpu_tuple_costs and good if I raise cpu_tuple_costs. Is it is a good practice to raise them if i want to force postgres to use indexes more often? Reducing random_page_cost is usually the best way to get the planner to favor indexscans more. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster