Re: [PERFORM] Moving postgresql.conf tunables into 2003...
Sean Chittenden [EMAIL PROTECTED] writes: If you CLUSTER on an index and then ANALYSE, you get a correlation of 1.0 (== optimum) for the first column of the index. Correlating of what to what? Of data to nearby data? Of data to related data (ie, multi-column index?)? Of related data to pages on disk? Not 100% sure in what context you're using the word correlation... The correlation is between index order and heap order --- that is, are the tuples in the table physically in the same order as the index? The better the correlation, the fewer heap-page reads it will take to do an index scan. Note it is possible to measure correlation without regard to whether there actually is any index; ANALYZE is simply looking to see whether the values appear in increasing order according to the datatype's default sort operator. One problem we have is extrapolating from the single-column correlation stats computed by ANALYZE to appropriate info for multi-column indexes. It might be that the only reasonable fix for this is for ANALYZE to compute multi-column stats too when multi-column indexes are present. People are used to the assumption that you don't need to re-ANALYZE after creating a new index, but maybe we'll have to give that up. But that value will degrade after time and at what rate? Does ANALYZE maintain that value so that it's kept acurrate? You keep it up to date by ANALYZE-ing at suitable intervals. It's no different from any other statistic. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Moving postgresql.conf tunables into 2003...
is some other problem that needs to be solved. (I'd wonder about index correlation myself; we know that that equation is pretty bogus.) Could be. I had him create a multi-column index on the date and a non-unique highly redundant id. Tom has already suspected index correlation to be a possible source of the problem and recommended to CLUSTER on the index. A weakness of the current planner implementation is that a multi column index is always thought to have low correlation. In your case even after CLUSTER the 2-column index on (date, sensorid) is treated like a single column index with correlation 0.5. Howdy. Well, I got far enough with the guy in the testing to figure out that it wasn't a single vs multi-column index problem, however I haven't heard back from him regarding the use of CLUSTER. Ce est la IRC. :-p I have an experimental patch lying around somewhere that tries to work around these problems by offering different estimation methods for index scans. If you are interested, I'll dig it out. Sure, I'll take a gander... had my head in enough Knuth recently to even hopefully have some kind of a useful response to the patch. In the meantime have him try with a single column index on date. Been there, done that: no change. |That's one heck of a poor estimate for the number of rows returned. | | - Seq Scan on mss_fwevent (cost=0.00..223312.60 rows=168478 width=12) |(actual time=24253.66..24319.87 rows=320 loops=1) - Index Scan using mss_fwevent_evtime_sensorid_idx on mss_fwevent (cost=0.00..2442524.70 rows=168478 width=12) (actual time=68.36..132.84 rows=320 loops=1) Index Cond: ((evtime (now() - '06:00'::interval)) AND (sensorid = 7)) Filter: (NOT action) Estimated number of rows being wrong by a factor 500 seems to be the main problem hiding everything else. With statistics already set to 1000, does this mean that sensorid, evtime, and action are not independent? It'd be interesting to know whether the estimation error comes from Index Cond or from Filter. Hrm... sensorid is sequence and grows proportional with evtime, obviously. Action is a char(1) or something like that (ie: not unique). See the EXPLAIN ANALYZEs that I posted in msgid: [EMAIL PROTECTED] or at the bottom of this msg. Having spent a fair amount of time looking at the two following plans, it seems as though an additional statistic is needed to change the cost of doing an index lookup when the index is linearly ordered. Whether CLUSTER does this or not, I don't know, I never heard back from him after getting the runtime down to a few ms. :-/ Are indexes on linearly ordered data rebalanced somehow? I thought CLUSTER only reordered data on disk. -sc Plan for normal random_page_cost: mss_masterlog= SHOW random_page_cost; random_page_cost -- 4 (1 row) mss_masterlog= EXPLAIN ANALYZE SELECT srca, COUNT(srca) FROM mss_fwevent WHERE mss_masterlog- sensorid = 7 AND evtime (now() - '6 hours'::INTERVAL) mss_masterlog- AND NOT action GROUP BY srca ORDER BY COUNT DESC LIMIT 20; QUERY PLAN -- Limit (cost=240384.69..240384.74 rows=20 width=12) (actual time=24340.04..24340.39 rows=20 loops=1) - Sort (cost=240384.69..240426.80 rows=16848 width=12) (actual time=24340.02..24340.14 rows=21 loops=1) Sort Key: count(srca) - Aggregate (cost=237938.36..239201.95 rows=16848 width=12) (actual time=24322.84..24330.73 rows=23 loops=1) - Group (cost=237938.36..238780.75 rows=168478 width=12) (actual time=24322.57..24328.45 rows=320 loops=1) - Sort (cost=237938.36..238359.55 rows=168478 width=12) (actual time=24322.55..24324.34 rows=320 loops=1) Sort Key: srca - Seq Scan on mss_fwevent (cost=0.00..223312.60 rows=168478 width=12) (actual time=24253.66..24319.87 rows=320 loops=1) Filter: ((sensorid = 7) AND (evtime (now() - '06:00'::interval)) AND (NOT action)) Total runtime: 24353.67 msec (10 rows) Plan for altered random_page_cost: mss_masterlog= SET random_page_cost = 0.32; SET mss_masterlog= EXPLAIN ANALYZE SELECT srca, COUNT(srca) FROM mss_fwevent WHERE mss_masterlog- sensorid = 7 AND evtime (now() - '6 hours'::INTERVAL) mss_masterlog- AND NOT action GROUP BY srca ORDER BY COUNT DESC LIMIT 20; QUERY PLAN - -- Limit (cost=227274.85..227274.90 rows=20 width=12) (actual
Re: [PERFORM] Moving postgresql.conf tunables into 2003...
Michael Pohl wrote: On Sun, 6 Jul 2003, Matthew Nuzum wrote: At the very least, if there is good documentation for these parameters, maybe the conf file should provide a link to this info. I believe that is what Josh is proposing: http://archives.postgresql.org/pgsql-performance/2003-07/msg00102.php [Apache httpd] uses a three phase (if not more) documentation level. The .conf file contains detailed instructions in an easy to read and not-to-jargon-ish structure. The docs provide detailed tutorials and papers that expand on configuration params in an easy to read format. Both of these refer to the thorough reference manual that breaks each possible option down into it's nitty gritty details so that a user can get more information if they so desire. I agree that Apache's approach is primo. Often the .conf comments are enough to jog my memory about a directive I haven't used for a while. Or the comments are enough to let me know I don't need a directive, or that I need to go to the manual and read more. I appreciate that. Isn't that what we have now --- isn't postgresql.conf clear enough to jog people's memory. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Moving postgresql.conf tunables into 2003...
Are you willing to say that the PostgreSQL database system should only be used by DBAs? I believe that Postgres is such a good and useful tool that anyone should be able to start using it with little or no barrier to entry. I quite agree. But there is a difference between saying you should get decent performance with no effort and you should get optimal performance with no effort. I think we can get to the first with relatively little trouble (like boosting the default shared_buffers to 1000), but the second is an impractical goal. Just wanted to repeat some of the thoughts already been expressed. There are no reasons why shouldn't PostgreSQL be reasonably well configured for a particular platform out of the box. Not for maximum performance but for good enough performance. The many complaints by new users about PostgreSQL being suprisingly slow and the all the so standard answers (vacuum, pump up memory settings) imho prove that the default installatio can be improved. Already mentioned in the mail lists: using multiple standard conf files, quering system info and dynamically generating all or some parts of the conf file, automating the vacuum process... Kaarel ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Moving postgresql.conf tunables into 2003...
On Wed, 9 Jul 2003, Kaarel wrote: Are you willing to say that the PostgreSQL database system should only be used by DBAs? I believe that Postgres is such a good and useful tool that anyone should be able to start using it with little or no barrier to entry. I quite agree. But there is a difference between saying you should get decent performance with no effort and you should get optimal performance with no effort. I think we can get to the first with relatively little trouble (like boosting the default shared_buffers to 1000), but the second is an impractical goal. Just wanted to repeat some of the thoughts already been expressed. There are no reasons why shouldn't PostgreSQL be reasonably well configured for a particular platform out of the box. Not for maximum performance but for good enough performance. The many complaints by new users about PostgreSQL being suprisingly slow and the all the so standard answers (vacuum, pump up memory settings) imho prove that the default installatio can be improved. Already mentioned in the mail lists: using multiple standard conf files, quering system info and dynamically generating all or some parts of the conf file, automating the vacuum process... It would be nice to have a program that could run on any OS postgresql runs on and could report on the current limits of the kernel, and make recommendations for changes the admin might want to make. One could probably make a good stab at effective cache size during install. Anything reasonably close would probably help. Report what % of said resources could be consumed by postgresql under various circumstances... ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Moving postgresql.conf tunables into 2003...
Scott Marlowe wrote: It would be nice to have a program that could run on any OS postgresql runs on and could report on the current limits of the kernel, and make recommendations for changes the admin might want to make. One could probably make a good stab at effective cache size during install. Anything reasonably close would probably help. Report what % of said resources could be consumed by postgresql under various circumstances... One of the issues that automating the process would encounter are limits in the kernel that are too low for PostgreSQL to handle. The BSD's come to mind where they need values manually increased in the kernel before you can reach a reasonable maximum connection count. Another example is how OpenBSD will outright crash when trying to test the database during install time. It seems that most of the tests fail because the maximum amount of processes allowed is too low for the test to succeed. While FreeBSD will work just fine on those same tests. If PostgreSQL automates the configuration, that would be a plus. But also detect the platform and inform the person that these changes should be done to the kernel, sysctl or whatever in order to have that configuration run. Perl may be useful in this for a few reasons. It's portable enough to run on multiple Unix variants and the tools would be fairly standard, so the code would require less considerations for more exotic implementations. Martin Foster Creator/Designer Ethereal Realms [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Moving postgresql.conf tunables into 2003...
I don't have much to add because I'm pretty new to Postgres and have been soliciting advice here recently, but I totally agree with everything you said. I don't mind if it's in the postgres.conf file or in a faq that is easy to find, I just would like it to be in one place. A good example of the need for this is when I was tuning effective_cache I thought that was creating a cache for Postgres when in fact as it was pointed out to me, it's just hinting to postgres the size of the OS cache. Lots of ways for people to get really confused here. I looked through the src/doc/runtime.sgml for a good place to stick this and couldn't find a place that this seemed appropriate, but on FreeBSD, this can be determined with a great deal of precision in a programmatic manner: echo effective_cache_size = $((`sysctl -n vfs.hibufspace` / 8192)) The same OID is available via C too. It'd be slick if PostgreSQL could tune itself (on FreeBSD) at initdb time with the above code. If Linux exports this info via /proc and can whip out the appropriate magic, even better. An uncommented out good guess that shows up in postgresql.conf would be stellar and quite possible with the use of sed. Maybe an initdb switch could be added to have initdb tune the config it generates? If a -n is added, have it generate a config and toss it to stdout? case `uname` in FreeBSD) echo effective_cache_size = $((`sysctl -n vfs.hibufspace` / 8192)) ;; *) echo Unable to automatically determine the effective cache size /dev/stderr ;; esac -sc -- Sean Chittenden ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Moving postgresql.conf tunables into 2003...
Sean Chittenden wrote: I looked through the src/doc/runtime.sgml for a good place to stick this and couldn't find a place that this seemed appropriate, but on FreeBSD, this can be determined with a great deal of precision in a programmatic manner: echo effective_cache_size = $((`sysctl -n vfs.hibufspace` / 8192)) The same OID is available via C too. It'd be slick if PostgreSQL could tune itself (on FreeBSD) at initdb time with the above code. If Linux exports this info via /proc and can whip out the appropriate magic, even better. An uncommented out good guess that shows up in postgresql.conf would be stellar and quite possible with the use of sed. Maybe an initdb switch could be added to have initdb tune the config it generates? If a -n is added, have it generate a config and toss it to stdout? case `uname` in FreeBSD) echo effective_cache_size = $((`sysctl -n vfs.hibufspace` / 8192)) ;; *) echo Unable to automatically determine the effective cache size /dev/stderr ;; esac -sc Simplest way may be to create a 'auto-tune' directory with scripts for configured platforms. When postgres installs the databases, it checks for 'tune.xxx' and if found uses that to generate the script itself? This would allow for defaults on platforms that do not have them and optimization for those that do. Martin Foster Creator/Designer Ethereal Realms [EMAIL PROTECTED] ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Moving postgresql.conf tunables into 2003...
Matthew Nuzum wrote: I'm highly resistant to/disappointed in this attitude and firmly believe that there are well understood algorithms that DBAs use to diagnose and solve performance problems. It's only a black art because it hasn't been documented. Performance tuning isn't voodoo, it's adjusting constraints to align with the execution of applications and we know what the applications do, therefore the database can mold to the applications' needs. I agree. We often seem to forget simple lessons in human nature. Expecting someone to spend 20 extra seconds to do something is often too much. In many cases, the only manual that a person will see is the .conf files. In my opinion, a serious RDBMS system will *always* require the admin to be doing research in order to learn how to use it effectively. We are not talking about a word processor here. That being said, I think that a good part of the problem is that admins don't know where to look for the appropriate documentation and what is needed. Expecting someone to spend 20 seconds looking for a piece of info is not too bad, but expecting them to spend hours trying to figure out what info is relavent is not going to get us anywhere. For those who have been following the discussion relating to MySQL vs PostgreSQL, I think this is relavent here. MySQL does much of its tuning at compile time, and the MySQL team very carefully controls the build process for the various binary distriutions they offer. If you want to see a real mess, try compiling MySQL from source. Talk about having to read documentation on items which *should* be handled by the configure script. OTOH, PostgreSQL is optomized using configuration files and is tunable on the fly. This is, I think, a better approach but it needs to be better documented. Maybe a Beginner's guide to database server tuning or something like that. Secondly, documenting the tuning algorythms well my allow PostgreSQL to automatically tune itself to some extent or for the development of performance tuning tools for the server. This would be a big win for the project. Unfortunately I am not knowledgable on this topic to really do this subject justice. Best Wishes, Chris Travers ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Moving postgresql.conf tunables into 2003...
Brian Suggests: I'm curious how many of the configuration values can be determined automatically, or with the help of some script. It seem like there could be some perl script in contrib that could help figure this out. Possibly you are asked a bunch of questions and then the values are computed based on that. Something like: This would be great! Wanna be in charge of it? Is there a to-do list for this kind of stuff? Maybe there could be a help wanted sign on the website. Seems like there are lot's of good ideas that fly around here but never get followed up on. Additionally, I have an increasingly large production database that I would be willing to do some test-cases on. I don't really know how to do it though... If someone where able to give instructions I could run tests on three different platforms. Matthew Nuzum | Makers of Elite Content Management System www.followers.net | View samples of Elite CMS in action [EMAIL PROTECTED] | http://www.followers.net/portfolio/ ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Moving postgresql.conf tunables into 2003...
On Sun, 6 Jul 2003, Matthew Nuzum wrote: At the very least, if there is good documentation for these parameters, maybe the conf file should provide a link to this info. I believe that is what Josh is proposing: http://archives.postgresql.org/pgsql-performance/2003-07/msg00102.php [Apache httpd] uses a three phase (if not more) documentation level. The .conf file contains detailed instructions in an easy to read and not-to-jargon-ish structure. The docs provide detailed tutorials and papers that expand on configuration params in an easy to read format. Both of these refer to the thorough reference manual that breaks each possible option down into it's nitty gritty details so that a user can get more information if they so desire. I agree that Apache's approach is primo. Often the .conf comments are enough to jog my memory about a directive I haven't used for a while. Or the comments are enough to let me know I don't need a directive, or that I need to go to the manual and read more. I appreciate that. michael ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Moving postgresql.conf tunables into 2003...
Michael Pohl wrote: On Sun, 6 Jul 2003, Matthew Nuzum wrote: At the very least, if there is good documentation for these parameters, maybe the conf file should provide a link to this info. I believe that is what Josh is proposing: http://archives.postgresql.org/pgsql-performance/2003-07/msg00102.php [Apache httpd] uses a three phase (if not more) documentation level. The .conf file contains detailed instructions in an easy to read and not-to-jargon-ish structure. The docs provide detailed tutorials and papers that expand on configuration params in an easy to read format. Both of these refer to the thorough reference manual that breaks each possible option down into it's nitty gritty details so that a user can get more information if they so desire. I agree that Apache's approach is primo. Often the .conf comments are enough to jog my memory about a directive I haven't used for a while. Or the comments are enough to let me know I don't need a directive, or that I need to go to the manual and read more. I appreciate that. michael One thing that may also help, is to include more sample .conf files. For example, you could include settings that would be commonly seen for decicated databases with generic specs and another with less resources and not dedicated for use with Postgres. This would allow users to see how certain setting changes will work. The default .conf is great if you want to setup a small test bed, but for a real life example chances are it won't exactly be what your looking for. Martin Foster Creator/Designer Ethereal Realms [EMAIL PROTECTED] ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Moving postgresql.conf tunables into 2003...
On Thu, 3 Jul 2003, Sean Chittenden wrote: What are the odds of going through and revamping some of the tunables in postgresql.conf for the 7.4 release? I was just working with someone on IRC and on their 7800 RPM IDE drives, their random_page_cost was ideally suited to be 0.32: a far cry from 4. Doing so has been a win across the board and the problem query went from about 40sec (seq scan) down to 0.25ms (using idx, higher than 0.32 resulted in a query time jump to 2sec, and at 0.4 it went back up to a full seq scan at 40sec). I'm the guy who advocates settings of 1 to 2, and that still sounds low to me. :-) I'm wondering if the effective_cache_size was set properly, as well as there be enough buffers allocated. I generally set effective cache size to 100,000 pages (800 megs or so) on my box, which is where it sits most days. with this setting I've found that settings of under 1 are not usually necessary to force the planner to take the path of righteousness (i.e. the fastest one :-) 1.2 to 1.4 are optimal to me. Since theoretically a random page of of 1 means no penalty to move the heads around, and there's ALWAYS a penalty for moving the heads around, we have to assume: 1: That either the planner is making poor decisions on some other variable, and we can whack the planner in the head with a really low random page count. OR 2: The other settings are suboptimal (buffers, sort_mem, effective_cache_size, etc...) and lowering random page costs helps there. I've always wondered if most performance issues aren't a bit of both. The answer, of course, is fixing the planner so that a random_page_cost of anything less than 1 would never be needed, since by design, anything under 1 represents a computer that likely doesn't exist (in theory of course.) A 1 would be a machine that was using solid state hard drives and had the same cost in terms of OS paths to do random accesses as sequential. What constants in the planner, and / or formulas would be the likely culprits I wonder? I've wandered through that page and wasn't sure what to play with. I know Josh is working on revamping the postgresql.conf file, but would it be possible to include suggested values for various bits of hardware and then solicit contributions from admins on this list who have tuned their DB correctly? ## random_page_cost -- units are one sequential page fetch cost #random_page_cost = 4 # default - very conservative #random_page_cost = 0.9 # IDE 5200 RPM, 8MB disk cache #random_page_cost = 0.3 # IDE 7800 RPM, 4MB disk cache #random_page_cost = 0.1 # SCSI RAID 5, 10,000RPM, 64MB cache #random_page_cost = 0.05# SCSI RAID 1+0, 15,000RPM, 128MB cache #... ## next_hardware_dependent_tunable #hardware_dependent_tunable I know these tables could get somewhat lengthy or organized differently, but given the file is read _once_ at _startup_, seen by thousands of DBAs, is visited at least once for every installation (at the least to turn on TCP connections), is often the only file other than pg_hba.conf that gets modified or looked at, this could be a very nice way of introducing DBAs to tuning PostgreSQL and reducing the number of people crying PostgreSQL's slow. Having postgresql.conf a clearing house for tunable values for various bits of hardware would be a huge win for the community and would hopefully radically change this database's perception. At the top of the file, it would be useful to include a blurb to the effect of: # The default values for PostgreSQL are extremely conservative and are # likely far from ideal for a site's needs. Included in this # configuration, however, are _suggested_ values to help aid in # tuning. The values below are not authoritative, merely contributed # suggestions from PostgreSQL DBAs and committers who have # successfully tuned their databases. Please take these values as # advisory only and remember that they will very likely have to be # adjusted according to your site's specific needs. If you have a # piece of hardware that isn't mentioned below and have tuned your # configuration aptly and have found a suggested value that the # PostgreSQL community would benefit from, please send a description # of the hardware, the name of the tunable, and the tuned value to # [EMAIL PROTECTED] to be considered for inclusion in future # releases. # # It should also go without saying that the PostgreSQL Global # Development Group and its community of committers, contributors, # administrators, and commercial supporters are absolved from any # responsibility or liability with regards to the use of its software # (see this software's license for details). Any data loss, # corruption, or performance degradation is the responsibility of the # individual or group of individuals using/managing this installation. # # Hints to DBAs: # # *) Setup a regular backup schedule (hint: