[PERFORM] monitoring suggestions
PostgreSQL: 9.1 OS: Red Hat 6 This PostgreSQL instance is used for dynamic web content. It runs on a dedicated server. So I need some PostgreSQL monitoring advice. There are two basic strategies that I am aware of for configuring PostgreSQL: 1) In Memory: With an in memory option you give PostgreSQL 70% or more of the memory by setting the shared buffers. You are relying on PostgreSQL to put into memory the information within the database. The only access to the disk from my understanding should be for the initial read of data into a block of memory and when updates are made to data blocks. The advantage of this strategy is that if you notice an increase in the Linux swap file then you know you need to increase the memory on the server as well as PostgreSQL. 2) Disk Caching: With this approach you are relying on the operating system to cache disk files in memory. PostgreSQL will scan the disk cache for the data it needs. In order to use this strategy you set the amount of shared buffers to a low number like 1G or less. You also want to make sure to set effective cache size to the amount of memory that you expect your server's OS to use for disk caching. The only major drawback for me with this strategy is how do I know when I need more memory for the OS to use when caching my files? If I were to use option #2 above what type of monitoring would you suggest I use to tell me when I need to add more memory? Thanks, Lance Campbell Software Architect Web Services at Public Affairs 217-333-0382
[PERFORM] Tablespaces on a raid configuration
PostgreSQL 9.0.x When PostgreSQL storage is using a relatively large raid 5 or 6 array is there any value in having your tables distributed across multiple tablespaces if those tablespaces will exists on the same raid array? I understand the value if you were to have the tablespaces on different raid arrays. But what about on the same one? Thanks, Lance Campbell Software Architect Web Services at Public Affairs 217-333-0382
[PERFORM] database slowdown while a lot of inserts occur
PostgreSQL 9.0.x We have around ten different applications that use the same database. When one particular application is active it does an enormous number of inserts. Each insert is very small. During this time the database seems to slow down in general. The application in question is inserting into a particular table that is not used by the other applications. 1) What should I do to confirm that the database is the issue and not the applications? 2) How can I identify where the bottle neck is occurring if the issue happens to be with the database? I have been using PostgreSQL for eight years. It is an amazing database. Thanks, Lance Campbell Software Architect Web Services at Public Affairs 217-333-0382
Re: [PERFORM] database slowdown while a lot of inserts occur
I forgot to mention that the slowdown in particular for other applications is when they are trying to insert or update tables unrelated to the application mentioned in my prior application that does the massive small inserts. Thanks, Lance Campbell Software Architect Web Services at Public Affairs 217-333-0382 From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Campbell, Lance Sent: Thursday, March 29, 2012 12:59 PM To: pgsql-performance@postgresql.org Subject: [PERFORM] database slowdown while a lot of inserts occur PostgreSQL 9.0.x We have around ten different applications that use the same database. When one particular application is active it does an enormous number of inserts. Each insert is very small. During this time the database seems to slow down in general. The application in question is inserting into a particular table that is not used by the other applications. 1) What should I do to confirm that the database is the issue and not the applications? 2) How can I identify where the bottle neck is occurring if the issue happens to be with the database? I have been using PostgreSQL for eight years. It is an amazing database. Thanks, Lance Campbell Software Architect Web Services at Public Affairs 217-333-0382
Re: [PERFORM] How much memory is PostgreSQL using
Greg, Thanks for your help. 1) How does the number of buffers provided by pg_buffercache compare to memory (buffers * X = Y meg)? 2) Is there a way to tell how many total buffers I have available/max? Thanks, Lance Campbell Software Architect/DBA/Project Manager Web Services at Public Affairs 217-333-0382 -Original Message- From: Greg Smith [mailto:g...@2ndquadrant.com] Sent: Monday, March 29, 2010 11:54 AM To: Campbell, Lance Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] How much memory is PostgreSQL using Campbell, Lance wrote: Or is there some way to ask PostgreSQL how much memory are you using to cache disk blocks currently? You can install contrib/pg_buffercache into each database and count how many used blocks are there. Note that running queries using that diagnostic tool is really intensive due to the locks it takes, so be careful not to do that often on a production system. When you do a PG_DUMP does PostgreSQL put the disk blocks into shared buffers as it runs? To some extent. Most pg_dump activity involves sequential scans that are reading an entire table. Those are no different from any other process that will put disk blocks into shared_buffers. However, that usage pattern makes pg_dump particularly likely to run into an optimization in 8.3 and later that limits how much of shared_buffers is used when sequentially scanning a large table. See P10 of http://www.westnet.com/~gsmith/content/postgresql/InsideBufferCache.pdf for the exact implementation. Basically, anything bigger than shared_buffers / 4 uses a 256K ring to limit its cache use, but it's a little more complicated than that. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.us -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] How much memory is PostgreSQL using
PostgreSQL 8.4.3 OS: Linux Red Hat 4.x I changed my strategy with PostgreSQL recently to use a large segment of memory for shared buffers with the idea of caching disk blocks. How can I see how much memory PostgreSQL is using for this? I tried: ps aux | grep post | sort -k4 This lists the processes using memory at the bottom. Are they sharing memory or using individual their own blocks of memory? When I do top I see that VIRT is the value of my shared buffers plus a tiny bit. I see %MEM is only 2.4%, 2.6%, 1.0%,1.5%, and 1.1% for all of the running processes. Do I add these percentages up to see what amount of VIRT I am really using? Or is there some way to ask PostgreSQL how much memory are you using to cache disk blocks currently? When you do a PG_DUMP does PostgreSQL put the disk blocks into shared buffers as it runs? Thanks, Lance Campbell Software Architect/DBA/Project Manager Web Services at Public Affairs 217-333-0382
[PERFORM] memory question
PostgreSQL 8.4.3 Linux Redhat 5.0 Question: How much memory do I really need? From my understanding there are two primary strategies for setting up PostgreSQL in relationship to memory: 1) Rely on Linux to cache the files. In this approach you set the shared_buffers to a relatively low number. 2) You can set shared_buffers to a very large percentage of your memory so that PostgreSQL reserves the memory for the database. I am currently using option #1. I have 24 Gig of memory on my server and the database takes up 17 Gig of disk space. When I do the Linux command top I notice that 19 Gig is allocated for cache. Is there a way for me to tell how much of that cache is associated with the caching of database files? I am basically asking how much memory do I really need? Maybe I have complete over kill. Maybe I am getting to a point where I might need more memory. My thought was I could use option #2 and then set the number to a lower amount. If the performance is bad then slowly work the number up. Our server manager seems to think that I have way to much memory. He thinks that we only need 5 Gig. I don't really believe that. But I want to cover myself. With money tight I don't want to be the person who is wasting resources. We need to replace our database servers so I want to do the right thing. Thanks, Lance Campbell Software Architect/DBA/Project Manager Web Services at Public Affairs 217-333-0382
[PERFORM] Best settings to load a fresh database
PostgreSQL 8.3 Linux RedHat 4.X 24G of memory When loading a file generated from pg_dumpall is there a key setting in the configuration file that would allow the load to work faster. Thanks, Lance Campbell Project Manager/Software Architect/DBA Web Services at Public Affairs 217-333-0382
[PERFORM] How many inserts am I doing
PostgreSQL: 8.2 How can you identify how many inserts are being done in a given time frame for a database? Thanks, Lance Campbell Project Manager/Software Architect Web Services at Public Affairs University of Illinois 217.333.0382 http://webservices.uiuc.edu My e-mail address has changed to [EMAIL PROTECTED]
[PERFORM] How many updates and inserts
PostgreSQL: 8.2 How can I identify how many inserts and updates are being done in a given time frame for a database? Thanks, Lance Campbell Project Manager/Software Architect Web Services at Public Affairs University of Illinois 217.333.0382 http://webservices.uiuc.edu My e-mail address has changed to [EMAIL PROTECTED]
[PERFORM] Creating a foreign key
PostgreSQL: 8.2 When you create a foreign key to a table is there an index that is created on the foreign key automatically? Example: Table A has a field called ID. Table B has a field called fk_a_id which has a constraint of being a foreign key to table A to field ID. Is there an index automatically created on field fk_a_id in table B when I create a foreign key constraint? I assume yes. But I wanted to check. I did not see it specifically mentioned in the documentation. I also see CREATE TABLE / PRIMARY KEY will create implicit index when creating a primary key but I don't see any similar statement when creating a foreign key. Thanks, Lance Campbell Project Manager/Software Architect Web Services at Public Affairs University of Illinois 217.333.0382 http://webservices.uiuc.edu
Re: [PERFORM] Creating a foreign key
Shaun, Thanks for the very detailed description of why posgres does not auto create indexes. That makes a lot of sense. Thanks again, Lance Campbell Project Manager/Software Architect Web Services at Public Affairs University of Illinois 217.333.0382 http://webservices.uiuc.edu -Original Message- From: Shaun Thomas [mailto:[EMAIL PROTECTED] Sent: Thursday, May 08, 2008 12:19 PM To: Campbell, Lance Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Creating a foreign key On Thu, 2008-05-08 at 17:52 +0100, Campbell, Lance wrote: Is there an index automatically created on field fk_a_id in table B when I create a foreign key constraint? No. The problem with doing this is it assumes certain things about your infrastructure that may be entirely false. Indexes are to speed up queries by logarithmically reducing the result set to matched index parameters, and pretty much nothing else. Indexes are also not free, taking up both disk space and CPU time to maintain, slowing down inserts. Foreign keys are not bi-directional either. They actually check the index in the *source* table to see if the value exists. Having an index on a column referring to another table may be advantageous, but it's not always necessary. If you never use that column in a where clause, or it isn't restrictive enough, you gain nothing and lose speed in table maintenance. It's totally up to the focus of your table schema design, really. Only careful app management and performance analysis can really tell you where indexes need to go, beyond the rules-of-thumb concepts, anyway. I also see CREATE TABLE / PRIMARY KEY will create implicit index when creating a primary key but I don't see any similar statement when creating a foreign key. That's because the definition of a primary key is an index that acts as the primary lookup for the table. This is required to be an index, partially because it has an implied unique constraint, and also because it has a search-span of approximately 1 when locating a specific row from that table. But indexes aren't some kind of magical make a query faster sauce. With too many values, the cost of scanning them individually becomes prohibitive, and the database will fall-back to a faster sequence-scan, which can take advantage of the block-fetch nature of most storage devices to just blast through all the results for the values it's looking for. It's restrictive where clauses *combined* with well-chosen indexes that give you good performance, with a little tweaking here and there to make the query-planner happy. But that's the long version. Postgres is by no means bare-bones, but it assumes DBAs are smart enough to manage the structures they bolt onto the metal. :) -- Shaun Thomas Database Administrator Leapfrog Online 807 Greenwood Street Evanston, IL 60201 Tel. 847-440-8253 Fax. 847-570-5750 www.leapfrogonline.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Backup causing poor performance - suggestions
PostgreSQL: 8.2.4 We currently backup all of our database tables per schema using pg_dump every half hour. We have been noticing that the database performance has been very poor during the backup process. How can I improve the performance? Server Specs: Dedicated DB server Database takes up 8.0 Gig of disk space 2 Xeon 5160 dual cores 3.0 16 G of memory Two disks in raid 1 are used for the OS, database and backups. SAS 10,000 RPM drives. OS: Linux AS 4.x 64 bit shared_buffers = 1 GB work_mem = 20MB max_fsm_pages = 524288 random_page_cost=1.0 effective_cache_size=16GB max_connections=150 All other settings are the default settings. I have tried doing backups to a second set of disks but the performance only improved somewhat. Does anyone have advice on how to improve my performance during backup? Would adding two quad core processors improve performance? Thanks, Lance Campbell Project Manager/Software Architect Web Services at Public Affairs University of Illinois 217.333.0382 http://webservices.uiuc.edu
Re: [PERFORM] Backup causing poor performance - suggestions
Scott, The server is a Dell PowerEdge 2900 II with the standard Perc 6/I SAS controller with 256 MB cache. Lance Campbell Project Manager/Software Architect Web Services at Public Affairs University of Illinois 217.333.0382 http://webservices.uiuc.edu -Original Message- From: Scott Marlowe [mailto:[EMAIL PROTECTED] Sent: Monday, May 05, 2008 10:06 AM To: Campbell, Lance Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Backup causing poor performance - suggestions On Mon, May 5, 2008 at 8:59 AM, Campbell, Lance [EMAIL PROTECTED] wrote: PostgreSQL: 8.2.4 You should update to 8.2.7 as a matter of periodic maintenance. It's a very short and easy update. We currently backup all of our database tables per schema using pg_dump every half hour. We have been noticing that the database performance has been very poor during the backup process. How can I improve the performance? Server Specs: Dedicated DB server Database takes up 8.0 Gig of disk space 2 Xeon 5160 dual cores 3.0 16 G of memory Two disks in raid 1 are used for the OS, database and backups. SAS 10,000 RPM drives. OS: Linux AS 4.x 64 bit So, what kind of RAID controller are you using? And can you add more drives and / or battery backed cache to it? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Planning hot/live backups?
I back up around 10 Gig of data every half hour using pg_dump. I don't backup the entire database at once. Instead I backup at the schema namespace level. But I do all of them every half hour. It takes four minutes. That includes the time to copy the files to the backup server. I do each schema namespace backup consecutively. I also run vacuum full analyze once a day. My system is up 24/7 as well. I don't backup in the middle of the night. There is so little back. But I could. I am able to have more backups by not doing it when there are only a handful of transactions. Thanks, Lance Campbell Project Manager/Software Architect Web Services at Public Affairs University of Illinois 217.333.0382 http://webservices.uiuc.edu -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Steve Poe Sent: Monday, March 24, 2008 3:23 PM To: pgsql-performance@postgresql.org Subject: [PERFORM] Planning hot/live backups? The owners of the animal hospital where I work at want to consider live/hot backups through out the day so we're less likely to lose a whole day of transaction. We use Postgresql 8.0.15. We do 3AM backups, using pg_dumpall, to a file when there is very little activity. The hospital enjoys the overall performance of the veterinary application running on Postgresql. I know doing a mid-day backup when up to 60 computers (consistently 35-40) are access client/patient information, it will cause some frustration. I understand there needs to be balance of performance and backup of current records. While I know that not all situations are the same, I am hoping there is a performance latency that others have experienced when doing backups during the day and/or planning for cluster (or other types of redundancy). My animal hospital operates 24x7 and is in the south part of the San Francisco Bay area. Outside of sharing your experiences/input with me, I would not mind if you/your company do this type of consulting offline. Thank you. Steve - Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance - Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Windows XP 64 bit
I posted this question to the admin group. I just realized that I should have sent it here. I just read the following article: http://people.planetpostgresql.org/mha/index.php?/archives/162-PostgreSQ L-vs-64-bit-windows.html Would there be benefits in running PostgreSQL in a 32 bit mode on a 64 bit version of XP? My thought is that the OS could access more of the memory for the caching of the files. On my production Linux box, I don't allocate more than a 2 Gig to PostgreSQL. I leave the rest of the memory available for the caching of disk files. So even though PostgreSQL would be running in a 32 bit mode it seems like it would still run better on a 64 bit XP box compared to a 32 bit version. This of course assumes that one does have a sizeable database and more than 3 Gig of memory. Is this a correct assumption? Would the performance be relatively similar to that of Linux? Thanks, Lance Campbell Project Manager/Software Architect Web Services at Public Affairs University of Illinois 217.333.0382 http://webservices.uiuc.edu
[PERFORM] Large Objects and Toast
PostgreSQL: 8.2 My understanding is that when one creates a large object there is no way to link the large object to a field in a table so that cascading delete can occur. Is this correct? My understanding is that you have to manually delete the large object. I also read something about the OID ID being limited in size. What is the size limit of this OID type? I am sure that it is bigger than the number of files that I would be uploaded into my db; but I just want to get an idea of the range. When putting a reference to a large object in a table, should the type of the reference object be OID? Thanks, Lance Campbell Project Manager/Software Architect Web Services at Public Affairs University of Illinois 217.333.0382 http://webservices.uiuc.edu
[PERFORM] Putting files into fields in a table
PostgreSQL: 8.2 I am looking at the possibility of storing files in some of my database tables. My concern is obviously performance. I have configured PostgreSQL to take advantage of Linux file caching. So my PostgreSQL does not have a large setting for shared_buffers even though I have 24G of memory. The performance today is very good. Some questions I have: What data type should I use for fields that hold files? Is there anything that I should be aware of when putting files into a field in a table? When PostgreSQL accesses a table that has fields that contain files does it put the fields that contain the files into the shared_buffers memory area? Thanks, Lance Campbell Project Manager/Software Architect Web Services at Public Affairs University of Illinois 217.333.0382 http://webservices.uiuc.edu
Re: [PERFORM] Putting files into fields in a table
I did not see much info in the 8.2 documentation on BLOB. I did ready about bytea or binary data type. It seems like it would work for storing files. I guess I could stick with the OS for file storage but it is a pain. It would be easier to use the DB. Thanks, Lance Campbell Project Manager/Software Architect Web Services at Public Affairs University of Illinois 217.333.0382 http://webservices.uiuc.edu -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Alexander Staubo Sent: Thursday, December 13, 2007 1:39 PM To: Campbell, Lance Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Putting files into fields in a table On 12/13/07, Campbell, Lance [EMAIL PROTECTED] wrote: I am looking at the possibility of storing files in some of my database tables. My concern is obviously performance. I have configured PostgreSQL to take advantage of Linux file caching. So my PostgreSQL does not have a large setting for shared_buffers even though I have 24G of memory. This used to be the recommended way before 8.0. In 8.0, it is advantageous to give PostgreSQL more buffers. You should still make some room for the kernel cache. By storing files, I assume you mean a lot of data imported from files. The procs and cons of storing large amounts of data as PostgreSQL tuples has been debated before. You might want to search the archives. My opinion is that PostgreSQL is fine up to a point (let's say 10,000 tuples x 2KB), above which I would merely store references to file-system objects. Managing these objects can be painful, especially in a cluster of more than one machine, but at least it's fast and lightweight. What data type should I use for fields that hold files? PostgreSQL has two ways of storing large amounts of data in a single tuple: variable-length columns, and blobs. Blobs are divided into chunks and stored in separate tables, one tuple per chunk, indexed by offset, and PostgreSQL allows random access to the data. The downside is that they take up more room, they're slower to create, slower to read from end to end, and I believe there are some operations (such as function expressions) that don't work on them. Some replication products, including (the last time I looked) Slony, does not support replicating blobs. Blobs are not deprecated, I think, but they feel like they are. Variable-length columns such as bytea and text support a system called TOAST, which allow the first few kilobytes of the data to be stored in-place in the tuple, and the overflow to be stored elsewhere and potentially compressed. This system is much faster and tighter than blobs, but does not offer random I/O. Is there anything that I should be aware of when putting files into a field in a table? Backup dumps will increase in size in proportion to the size of your data. PostgreSQL is no speed demon at loading/storing data, so this might turn out to be the Achilles heel. When PostgreSQL accesses a table that has fields that contain files does it put the fields that contain the files into the shared_buffers memory area? I believe so. Alexander. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Putting files into fields in a table
Erik, The advantage with storing things in the database verses the file system is the number of lines of code. I manage 18 software applications. I have developed an application that reads in an XML file and will generate database java code for inserting, updating, selecting and deleting data. So for me the database is a no brainer. But when I need to store files that are uploaded by users I have to hand code the process. It is not hard. It is just time consuming. I want to keep the amount I can do per hour at a very high level. The less code the better. Using a database correctly really saves on the number of lines of code. Thanks, Lance Campbell Project Manager/Software Architect Web Services at Public Affairs University of Illinois 217.333.0382 http://webservices.uiuc.edu -Original Message- From: Erik Jones [mailto:[EMAIL PROTECTED] Sent: Thursday, December 13, 2007 2:22 PM To: Campbell, Lance Cc: pgsql-performance@postgresql.org list Subject: Re: [PERFORM] Putting files into fields in a table On Dec 13, 2007, at 2:09 PM, Campbell, Lance wrote: I did not see much info in the 8.2 documentation on BLOB. I did ready about bytea or binary data type. It seems like it would work for storing files. I guess I could stick with the OS for file storage but it is a pain. It would be easier to use the DB. In postgres they're simply called Large Objects (or LOBs) and there is a whole chapter devoted to them in Part IV of the manual. Note that you only need to use this facility if you're going to be storing data over 1G in size (at which point your limit becomes 2G). What kind of data are in these files? What gain do you foresee in storing the files directly in the db (as opposed, say, to storing the paths to the files in the filesystem)? Erik Jones Software Developer | Emma(r) [EMAIL PROTECTED] 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate market in style. Visit us online at http://www.myemma.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
[PERFORM] clear pg_stats
How can I clear the pg_stats views without restarting PostgreSQL? I thought there was a function. Thanks, Lance Campbell Project Manager/Software Architect Web Services at Public Affairs University of Illinois 217.333.0382 http://webservices.uiuc.edu
[PERFORM] work_mem and shared_buffers
Does the amount of memory allocate to work_mem get subtracted from shared_buffers? Example: If work_mem is 1M and there are 10 connections and shared_buffers is 100M then would the total be 90 M left for shared_buffers? Or does the amount of memory allocated for work_mem have nothing to do with shared_buffers? Thanks, Lance Campbell Project Manager/Software Architect Web Services at Public Affairs University of Illinois 217.333.0382 http://webservices.uiuc.edu
Re: [PERFORM] work_mem and shared_buffers
Wow. That is a nice logging feature in 8.3! Thanks, Lance Campbell Project Manager/Software Architect Web Services at Public Affairs University of Illinois 217.333.0382 http://webservices.uiuc.edu -Original Message- From: Bill Moran [mailto:[EMAIL PROTECTED] Sent: Friday, November 09, 2007 2:08 PM To: Campbell, Lance Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] work_mem and shared_buffers On Fri, 9 Nov 2007 12:08:57 -0600 Campbell, Lance [EMAIL PROTECTED] wrote: How do you know when you should up the value of work_mem? Just play with the number. Is there a query I could do that would tell me if PostgreSql is performing SQL that could use more memory for sorting? 8.2 and older, it can be difficult to know, and I don't have a specific recommendation. 8.3 includes a parameter to log the usage of temporary files by Postgres. When a sort can't fit in the available memory, it uses a temp file, thus you could use this new feature to track when sorts don't fit in work_mem. -- Bill Moran Potential Technologies http://www.potentialtech.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] work_mem and shared_buffers
It is amazing, how after working with databases very actively for over 8 years, I am still learning things. Thanks, Lance Campbell Project Manager/Software Architect Web Services at Public Affairs University of Illinois 217.333.0382 http://webservices.uiuc.edu -Original Message- From: Scott Marlowe [mailto:[EMAIL PROTECTED] Sent: Friday, November 09, 2007 1:13 PM To: Campbell, Lance Cc: Heikki Linnakangas; pgsql-performance@postgresql.org Subject: Re: [PERFORM] work_mem and shared_buffers On Nov 9, 2007 12:08 PM, Campbell, Lance [EMAIL PROTECTED] wrote: How do you know when you should up the value of work_mem? Just play with the number. Is there a query I could do that would tell me if PostgreSql is performing SQL that could use more memory for sorting? Trial and error. Note that you can set work_mem for a given session. While it may seem that making work_mem bigger will always help, that's not necessarily the case. Using this query: select count(*) from (select * from myreporttable where lasttime now() - interval '1 week' order by random() ) as l I did the following: (I ran the query by itself once to fill the buffers / cache of the machine with the data) work_mem Time: 1000kB 29215.563 ms 4000kB 20612.489 ms 8000kB 18408.087 ms 16000kB 16893.964 ms 32000kB 17681.221 ms 64000kB 22439.988 ms 125MB 23398.891 ms 250MB 25461.797 ms Note that my best time was at around 16 Meg work_mem. This data set is MUCH bigger than 16 Meg, it's around 300-400 Meg. But work_mem optimized out at 16 Meg. Btw, I tried it going as high as 768 Meg, and it was still slower than 16M. This machine has 2 Gigs ram and is optimized for IO not CPU performance. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PERFORM] index stat
PostgreSQL:8.2.4 I am collecting statistics info now on my database. I have used the following two queries: select * from pg_stat_all_indexes; select * from pg_statio_all_indexes; How can I use the information from these two queries to better optimize my indexes? Or maybe even get rid of some unnecessary indexes. Example output: relid | indexrelid | schemaname |relname| indexrelname| idx_blks_read | idx_blks_hit -++---+---+- --+---+-- 16801 | 57855 | a | screen| screen_index1 | 1088 | 213618 16801 | 57857 | a | screen| screen_index3 | 905 | 201219 16803 | 16805 | pg_toast | pg_toast_16801| pg_toast_16801_index | 3879 | 1387471 16978 | 16980 | pg_toast | pg_toast_16976| pg_toast_16976_index | 0 |0 942806 | 942822 | b| question_result_entry | question_result_entry_index1 |18 |0 942806 | 942824 | b| question_result_entry | question_result_entry_index2 |18 |0 942806 | 942828 | b| question_result_entry | question_result_entry_index3 |18 |0 relid | indexrelid | schemaname |relname| indexrelname| idx_scan | idx_tup_read | idx_tup_fetch -++---+---+- --+---+--+--- 16801 | 57855 | a| screen | screen_index1 | 48693 | 1961745 | 1899027 16801 | 57857 | a| screen | screen_index3 | 13192 | 132214 | 87665 16803 | 16805 | pg_toast | pg_toast_16801| pg_toast_16801_index |674183 | 887962 | 887962 16978 | 16980 | pg_toast | pg_toast_16976| pg_toast_16976_index | 0 |0 | 0 942806 | 942822 | b| question_result_entry | question_result_entry_index1| 0 |0 | 0 942806 | 942824 | b| question_result_entry | question_result_entry_index2| 0 |0 | 0 942806 | 942828 | b| question_result_entry | question_result_entry_index3| 0 |0 | 0 Thanks, Lance Campbell Project Manager/Software Architect Web Services at Public Affairs University of Illinois 217.333.0382 http://webservices.uiuc.edu
[PERFORM] Training Recommendations
PostgreSQL: 8.2.4 Does anyone have any companies they would recommend using for performance tuning training of PostgreSQL for Linux? Or general DBA training? Thanks, Lance Campbell Project Manager/Software Architect Web Services at Public Affairs University of Illinois 217.333.0382 http://webservices.uiuc.edu
Re: [PERFORM] Suggestions on an update query
Thanks for all of your help. The problem was that the result_entry table had some constraints that pointed to a third table. When I removed those constraints the performance was amazing. The update took less than seven minutes to execute. I did not even consider the fact that constraints to another table would impact the performance. Thanks again, Lance Campbell Project Manager/Software Architect Web Services at Public Affairs University of Illinois 217.333.0382 http://webservices.uiuc.edu -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Gregory Stark Sent: Friday, October 26, 2007 9:05 PM To: Joshua D. Drake Cc: Campbell, Lance; pgsql-performance@postgresql.org Subject: Re: [PERFORM] Suggestions on an update query Joshua D. Drake [EMAIL PROTECTED] writes: On Fri, 26 Oct 2007 15:31:44 -0500 Campbell, Lance [EMAIL PROTECTED] wrote: I forgot to include an additional parameter I am using in Postgresql.conf: O.k. first, just to get it out of the way (and then I will try and help). Please do not top post, it makes replying contextually very difficult. PostgreSql version 8.2.4 Memory = 8 Gig CPUs 1 dual core Zeon running at 3.0 O.k. first you might be grinding through your 20 checkpoint segments but in reality what I think is happening is you are doing foreign key checks against all of it and slowing things down. If you're going to berate someone about top-posting perhaps you should attach your own commentary to relevant bits of context :P But the original post didn't include any foreign key constraints. I suspect you've guessed it right though. In fact I suspect what's happening is he doesn't have an index on the referencing column so the foreign key checks are doing sequential scans of. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Suggestions on an update query
I forgot to include an additional parameter I am using in Postgresql.conf: checkpoint_segments = 30 Thanks, Lance Campbell Project Manager/Software Architect Web Services at Public Affairs University of Illinois 217.333.0382 http://webservices.uiuc.edu From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Campbell, Lance Sent: Friday, October 26, 2007 3:27 PM To: pgsql-performance@postgresql.org Subject: [PERFORM] Suggestions on an update query PostgreSql version 8.2.4 Memory = 8 Gig CPUs 1 dual core Zeon running at 3.0 I have a problem with an update query taking over 10 hours in order to run. I rebooted my server. I ran the SQL command analyze. Could you please help me with any suggestions? I have included the two tables involved in the update below as well as the indexes I am using. The table result_entry contains 17,767,240 rows and the table question_number contains 40,787. Each row from the result_entry table will match to one and only one row in the table question_number using the fk_question_id field. Each row from the question_number table matches to an average of 436 rows on the result_entry table. CREATE TABLE question_number ( fk_form_idinteger not null, fk_question_idinteger not null, question_number integer not null, sequence_id integer not null ); ALTER TABLE ONLY question_number ADD CONSTRAINT question_number_pkey PRIMARY KEY (fk_question_id); CREATE INDEX question_number_index1 ON question_number USING btree (question_number); CREATE TABLE result_entry ( fk_result_submission_id integer NOT NULL, fk_question_id integer NOT NULL, fk_option_order_id integer NOT NULL, value character varying, order_id integer NOT NULL, question_number integer ); CREATE INDEX result_entery_index1 ON result_entry USING btree (fk_question_id); update result_entry set question_number=question_number.question_number from question_number where result_entry.fk_question_id=question_number.fk_question_id; explain update result_entry set question_number=question_number.question_number from question_number where result_entry.fk_question_id=question_number.fk_question_id; QUERY PLAN - Hash Join (cost=1437.71..1046983.94 rows=17333178 width=32) Hash Cond: (result_entry.fk_question_id = question_number.fk_question_id) - Seq Scan on result_entry (cost=0.00..612216.78 rows=17333178 width=28) - Hash (cost=927.87..927.87 rows=40787 width=8) - Seq Scan on question_number (cost=0.00..927.87 rows=40787 width=8) (5 rows) Postgresql.conf settings: shared_buffers = 1GB work_mem = 10MB max_fsm_pages = 204800 random_page_cost = 1.0 effective_cache_size = 8GB Thanks for any help! Lance Campbell Project Manager/Software Architect Web Services at Public Affairs University of Illinois 217.333.0382 http://webservices.uiuc.edu
[PERFORM] Suggestions on an update query
PostgreSql version 8.2.4 Memory = 8 Gig CPUs 1 dual core Zeon running at 3.0 I have a problem with an update query taking over 10 hours in order to run. I rebooted my server. I ran the SQL command analyze. Could you please help me with any suggestions? I have included the two tables involved in the update below as well as the indexes I am using. The table result_entry contains 17,767,240 rows and the table question_number contains 40,787. Each row from the result_entry table will match to one and only one row in the table question_number using the fk_question_id field. Each row from the question_number table matches to an average of 436 rows on the result_entry table. CREATE TABLE question_number ( fk_form_idinteger not null, fk_question_idinteger not null, question_number integer not null, sequence_id integer not null ); ALTER TABLE ONLY question_number ADD CONSTRAINT question_number_pkey PRIMARY KEY (fk_question_id); CREATE INDEX question_number_index1 ON question_number USING btree (question_number); CREATE TABLE result_entry ( fk_result_submission_id integer NOT NULL, fk_question_id integer NOT NULL, fk_option_order_id integer NOT NULL, value character varying, order_id integer NOT NULL, question_number integer ); CREATE INDEX result_entery_index1 ON result_entry USING btree (fk_question_id); update result_entry set question_number=question_number.question_number from question_number where result_entry.fk_question_id=question_number.fk_question_id; explain update result_entry set question_number=question_number.question_number from question_number where result_entry.fk_question_id=question_number.fk_question_id; QUERY PLAN - Hash Join (cost=1437.71..1046983.94 rows=17333178 width=32) Hash Cond: (result_entry.fk_question_id = question_number.fk_question_id) - Seq Scan on result_entry (cost=0.00..612216.78 rows=17333178 width=28) - Hash (cost=927.87..927.87 rows=40787 width=8) - Seq Scan on question_number (cost=0.00..927.87 rows=40787 width=8) (5 rows) Postgresql.conf settings: shared_buffers = 1GB work_mem = 10MB max_fsm_pages = 204800 random_page_cost = 1.0 effective_cache_size = 8GB Thanks for any help! Lance Campbell Project Manager/Software Architect Web Services at Public Affairs University of Illinois 217.333.0382 http://webservices.uiuc.edu
[PERFORM] SQL Monitoring
PGSQL 8.2.4 I have noticed a slight spike in the amount of CPU usage in the last few weeks. I am sure it has to do with a change or two that was made to some queries. What is the best way to log the SQL that is being executed? I would prefer to limit the size of the log file to 2 G. Is there a way to do this? Thanks for any help, Lance Campbell Project Manager/Software Architect Web Services at Public Affairs University of Illinois 217.333.0382 http://webservices.uiuc.edu
[PERFORM] Installing PostgreSQL
Should installation questions be sent here or to the admin listserv? OS: redhat linux Version of PostgreSQL: 8.2.4 I had a group that now manages our server set up a directory/partition for us to put postgreSQL into. The directory is called pgsql_data. The directory is more than a regular directory. It contains a subdirectory called lost+found. I would assume this is a logical partition. I tried installing postgreSQL directly into this directory but it failed since there is a file in this directory, lost+found. Is there a way around this? Worst case scenario I will create a subdirectory called data and put the install in there. I would have preferred to put it directly into the pgsql_data. There would be no other files that would have gone into the directory/partition other than postgreSQL. Would it be possible for me to install postgreSQL into a sub directory of pgsql_data and then move the files up a directory into pgsql_data? Thanks, Lance Campbell Project Manager/Software Architect Web Services at Public Affairs University of Illinois 217.333.0382 http://webservices.uiuc.edu
Re: [PERFORM] Installing PostgreSQL
Richard, So what you are saying is that if you install PostgeSQL into a data directory /abc/data you could then stop the database, move the files into /def/data, and then start the database making sure to point to the new data directory. PostgreSQL is therefore referencing its files relative to the data directory the files are in. Is this a correct observation? Thanks, Lance Campbell Project Manager/Software Architect Web Services at Public Affairs University of Illinois 217.333.0382 http://webservices.uiuc.edu -Original Message- From: Richard Huxton [mailto:[EMAIL PROTECTED] Sent: Thursday, August 23, 2007 12:08 PM To: Campbell, Lance Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Installing PostgreSQL Campbell, Lance wrote: Should installation questions be sent here or to the admin listserv? Probably the pgsql-general/admin/novice lists OS: redhat linux RHES? Version of PostgreSQL: 8.2.4 OK I had a group that now manages our server set up a directory/partition for us to put postgreSQL into. The directory is called pgsql_data. The directory is more than a regular directory. It contains a subdirectory called lost+found. I would assume this is a logical partition. No - if you get filesystem corruption any recovered disk-blocks are put into files here. All your disk partitions will have such a directory. I tried installing postgreSQL directly into this directory but it failed since there is a file in this directory, lost+found. Is there a way around this? Worst case scenario I will create a subdirectory called data and put the install in there. That's what you want to do. Apart from anything else it lets you set ownership permission of the directory. I would have preferred to put it directly into the pgsql_data. There would be no other files that would have gone into the directory/partition other than postgreSQL. Would it be possible for me to install postgreSQL into a sub directory of pgsql_data and then move the files up a directory into pgsql_data? Just symlink your directory to the correct place if that's what you want. Partition at: /mnt/pg_disk Directory is: /mnt/pg_disk/data symlink to: /var/db/data -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Installing PostgreSQL
Richard, I was able to prove that it works. Thanks for your time. Lance Campbell Project Manager/Software Architect Web Services at Public Affairs University of Illinois 217.333.0382 http://webservices.uiuc.edu -Original Message- From: Richard Huxton [mailto:[EMAIL PROTECTED] Sent: Thursday, August 23, 2007 12:26 PM To: Campbell, Lance Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Installing PostgreSQL Campbell, Lance wrote: Richard, So what you are saying is that if you install PostgeSQL into a data directory /abc/data you could then stop the database, move the files into /def/data, and then start the database making sure to point to the new data directory. PostgreSQL is therefore referencing its files relative to the data directory the files are in. Is this a correct observation? Yes - provided: 1. Ownership and permissions on the destination directory are correct 2. You remember to stop the server when copying -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] Terminology Question
Terminology Question: If I use the following statement: I am backing up schema XYZ every 30 minutes. Does this statement imply that I am only backing up the definition of the data? Or does it mean that I am backing up the definition of the data and the data within the schema object? Thanks, Lance Campbell Project Manager/Software Architect Web Services at Public Affairs University of Illinois 217.333.0382 http://webservices.uiuc.edu
[PERFORM] Table Statistics with pgAdmin III
I have installed pgAdmin III 1.6. In the tool when you click on a particular table you can select a tab called Statistics. This tab has all kinds of info on your table. For some reason the only info I see is for table size, toast table size and indexes size. Is there a reason that the other 15 fields have zeros in them? I was thinking that maybe I needed to turn on a setting within my database in order to get statistics reported. Thanks, Lance Campbell Project Manager/Software Architect Web Services at Public Affairs University of Illinois 217.333.0382 http://webservices.uiuc.edu
Re: [PERFORM] Table Statistics with pgAdmin III
All of the fields are zero except for the three I listed in my posting. Thanks, Lance Campbell Project Manager/Software Architect Web Services at Public Affairs University of Illinois 217.333.0382 http://webservices.uiuc.edu -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Dave Page Sent: Tuesday, July 24, 2007 12:50 PM To: Jean-Max Reymond Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Table Statistics with pgAdmin III --- Original Message --- From: Jean-Max Reymond [EMAIL PROTECTED] To: pgsql-performance@postgresql.org Sent: 24/07/07, 18:23:53 Subject: Re: [PERFORM] Table Statistics with pgAdmin III Campbell, Lance a écrit : I have installed pgAdmin III 1.6. In the tool when you click on a particular table you can select a tab called Statistics. This tab has all kinds of info on your table. For some reason the only info I see is for table size, toast table size and indexes size. Is there a reason that the other 15 fields have zeros in them? I was thinking that maybe I needed to turn on a setting within my database in order to get statistics reported. it seems that the module pgstattuple is needed That'll allow you to see extra stats in 1.8, but won't alter what you already see, in fact 1.6 won't use it at all. What values are at zero? Regards, Dave ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Volunteer to build a configuration tool
Greg, I have a PostgreSQL database that runs on a dedicated server. The server has 24Gig of memory. What would be the max size I would ever want to set the shared_buffers to if I where to relying on the OS for disk caching approach? It seems that no matter how big your dedicated server is there would be a top limit to the size of shared_buffers. Thanks, Lance Campbell Project Manager/Software Architect Web Services at Public Affairs University of Illinois 217.333.0382 http://webservices.uiuc.edu -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Greg Smith Sent: Thursday, June 21, 2007 2:15 AM To: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Volunteer to build a configuration tool On Wed, 20 Jun 2007, Campbell, Lance wrote: If everything I said is correct then I agree Why have effective_cache_size? Why not just go down the approach that Oracle has taken and require people to rely more on shared_buffers and the general memory driven approach? Why rely on the disk caching of the OS? First off, it may help explain the dynamics here if you know that until fairly recent releases, the PostgreSQL shared_buffers cache had some performance issues that made it impractical to make it too large. It hasn't been that long that relying more heavily on the Postgres cache was technically feasible. I think the user community at large is still assimilating all the implications of that shift, and as such some of the territory with making the Postgres memory really large is still being mapped out. There are also still some issues left in that area. For example, the bigger your shared_buffers cache is, the worse the potential is for having a checkpoint take a really long time and disrupt operations. There are OS tunables that can help work around that issue; similar ones for the PostgreSQL buffer cache won't be available until the 8.3 release. In addition to all that, there are still several reasons to keep relying on the OS cache: 1) The OS cache memory is shared with other applications, so relying on it lowers the average memory footprint of PostgreSQL. The database doesn't have to be a pig that constantly eats all the memory up, while still utilizing it when necessary. 2) The OS knows a lot more about the disk layout and similar low-level details and can do optimizations a platform-independant program like Postgres can't assume are available. 3) There are more people working on optimizing the caching algorithms in modern operating systems than are coding on this project. Using that sophisticated cache leverages their work. The Oracle Way presumes that you've got such a massive development staff that you can solve these problems better yourself than the community at large, and then support that solution on every platform. This is why they ended up with solutions like raw partitions, where they just put their own filesystem on the disk and figure out how to make that work well everywhere. If you look at trends in this area, at this point the underlying operating systems have gotten good enough that tricks like that are becoming marginal. Pushing more work toward the OS is a completely viable design choice that strengthens every year. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Volunteer to build a configuration tool
It is amazing how many times you can read something before it actually sinks in. There seems to be two possible approaches to optimizing PostgreSQL 8.2: File caching approach: This approach is based on the fact that the OS will cache the necessary PostgreSQL files. The key here is to set the size of effective_cache_size value as high as you think the OS has memory to cache the files. This approach would need the value of shared_buffers to be relatively low. Otherwise you are in a cense storing the data twice. One would also have to make sure that work_mem is not too high. Since the files would be cached by the OS, work_mem could be relatively low. This is an ideal approach if you have a dedicated server since there would be no other software using memory or accessing files that the OS would try to cache. Memory driven approach: In this approach you want to create a large value for shared_buffers. You are relying on shared_buffers to hold the most commonly accessed disk blocks. The value for effective_cache_size would be relatively small since you are not relying on the OS to cache files. This seems like it would be the ideal situation if you have other applications running on the box. By setting shared_buffers to a high value you are guaranteeing memory available to PostgreSQL (this assumes the other applications did not suck up to much memory to make your OS use virtual memory). This also seems more like how Oracle approaches things. Do I understand the possible optimization paths correctly? The only question I have about this approach is: if I use the memory driven approach since effective_cache_size would be small I would assume I would need to fiddle with random_page_cost since there would be know way for PostgreSQL to know I have a well configured system. If everything I said is correct then I agree Why have effective_cache_size? Why not just go down the approach that Oracle has taken and require people to rely more on shared_buffers and the general memory driven approach? Why rely on the disk caching of the OS? Memory is only getting cheaper. Thanks, Lance Campbell Project Manager/Software Architect Web Services at Public Affairs University of Illinois 217.333.0382 http://webservices.uiuc.edu -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Greg Smith Sent: Wednesday, June 20, 2007 10:21 AM To: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Volunteer to build a configuration tool On Wed, 20 Jun 2007, PFC wrote: Except planner hints (argh) I see no way to give this information to the machine... since it's mostly in the mind of the DBA. And the mind of the DBA has a funny way of being completely wrong some days about what's really happening under the hood. Maybe a per-table cache temperature param (hot, warm, cold), but what about the log table, the end of which is cached, but not the old records ? It's messy. One of the things that was surprising to me when I started looking at the organization of the PostgreSQL buffer cache is how little gross information about its contents is available. I kept expecting to find a summary section where you could answer questions like how much of the cache currently has information about index/table X? used as an input to the optimizer. I understand that the design model expects much of this is unknowable due to the interaction with the OS cache, and in earlier versions you couldn't make shared_buffers big enough for its contents to be all that interesting, so until recently this wasn't worth collecting. But in the current era, where it's feasible to have multi-GB caches efficiently managed by PG and one can expect processor time is relatively cheap, it seems to me one way to give a major boost to the optimizer is to add some overhead to buffer cache management so it collects such information. When I was trying to do a complete overhaul on the background writer, the #1 problem was that I had to assemble my own statistics on what was inside the buffer cache as it was scanned, because a direct inspection of every buffer is the only way to know things like what percentage of the cache is currently dirty. I can't figure out if I'm relieved or really worried to discover that Tom isn't completely sure what to do with effective_cache_size either. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] PostgreSQL Configuration Tool for Dummies
Please read the whole email before replying: I love the feedback I have received but I feel that somehow I did not communicate the intent of this mini project very well. So let me outline a few basics and who the audience was intended for. Mini project title: Initial Configuration Tool for PostgreSQL for Dummies 1) This is intended for newbie's. Not for experienced users or advanced DBAs. 2) This tool is NOT intended to monitor your PostgreSQL efficiency. 3) I suggested JavaScript because most people that get started with PostgreSQL will go to the web in order to find out about issues relating to configuration. I wanted a very simple way for people to access the tool that would not be tied to any particular environment or OS. If there is someone that is using a text browser to view the web then they are probably geeky enough not to want to bother with using this tool. 4) The intent is just to give people that have no clue a better starting point than some very generic defaults. Please think simple. I stress the word simple. The real challenge here is getting the formulas correct. Someone mentioned to not focus on the values but just get something out there for everyone to help tweak. I agree! What questions do you think should be asked in order to figure out what values should go into the formulas for the configuration suggestions? My thoughts: What version of PostgreSQL are you using? How much memory will be available to PostgreSQL? How many connections will be made to PostgreSQL? Thanks, Lance Campbell Project Manager/Software Architect Web Services at Public Affairs University of Illinois 217.333.0382 http://webservices.uiuc.edu
Re: [PERFORM] PostgreSQL Configuration Tool for Dummies
Francisco and Richard, Why ask about disk or raid? How would that impact any settings in postgresql.conf? I did forget the obvious question: What OS are you using? Thanks, Lance Campbell Project Manager/Software Architect Web Services at Public Affairs University of Illinois 217.333.0382 http://webservices.uiuc.edu -Original Message- From: Francisco Reyes [mailto:[EMAIL PROTECTED] Sent: Tuesday, June 19, 2007 11:58 AM To: Campbell, Lance Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] PostgreSQL Configuration Tool for Dummies Campbell, Lance writes: 3) I suggested JavaScript because most people that get started with PostgreSQL will go to the web in order to find out about issues relating Why not c? It could then go into contrib. Anyways.. language is likely the least important issue.. As someone mentioned.. once the formulas are worked out it can be done in a few languages.. as people desire.. How much memory will be available to PostgreSQL? How many connections will be made to PostgreSQL? Will this be a dedicated Postgresql server? Will there be mostly reads or will there also be significant amount of writes? Are you on a RAID system or do you have several disks over which you would like to run postgresql on? ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] PostgreSQL Configuration Tool for Dummies
Yudhvir, I completely agree. I was just putting together a similar email. Thanks, Lance Campbell Project Manager/Software Architect Web Services at Public Affairs University of Illinois 217.333.0382 http://webservices.uiuc.edu From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Y Sidhu Sent: Tuesday, June 19, 2007 12:49 PM To: pgsql-performance@postgresql.org Subject: Re: [PERFORM] PostgreSQL Configuration Tool for Dummies On 6/19/07, Francisco Reyes [EMAIL PROTECTED] wrote: Campbell, Lance writes: Francisco and Richard, Why ask about disk or raid? How would that impact any settings in postgresql.conf? If the user has 2 disks and says that he will do a lot of updates he could put pg_xlog in the second disk. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster Let's not ask about disk or raid at this level of sanity tuning. It is important for a newbie to take the right first step. When it comes to disks, we start talking I/O, SATA, SCSI and the varying degrees of SATA and SCSI, and controller cards. Then we throw in RAID and the different levels therein. Add to that, we can talk about drivers controlling these drives and which OS is faster, more stable, etc. As you can see, a newbie would get drowned. So, please keep it simple. I know many people on this list are Gurus. We know you are the best in this field, but we are not and are just trying to improve what we have. -- Yudhvir Singh Sidhu 408 375 3134 cell
Re: [PERFORM] PostgreSQL Configuration Tool for Dummies
D'Arcy, I wanted to put it on the www.postgresql.org site. That is what I said in my original email. I don't believe anyone from the actual project has contacted me. I am setting up a JavaScript version first. If someone wants to do a different one feel free. I will have all of the calculations in the JavaScript so it should be easy to do it in any language. Thanks, Lance Campbell Project Manager/Software Architect Web Services at Public Affairs University of Illinois 217.333.0382 http://webservices.uiuc.edu -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of D'Arcy J.M. Cain Sent: Tuesday, June 19, 2007 12:32 PM To: Francisco Reyes Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] PostgreSQL Configuration Tool for Dummies On Tue, 19 Jun 2007 12:58:26 -0400 Francisco Reyes [EMAIL PROTECTED] wrote: Campbell, Lance writes: 3) I suggested JavaScript because most people that get started with PostgreSQL will go to the web in order to find out about issues relating Why not c? Why not whatever and install it on www.PostgreSQL.org? Is there any reason that this tool would need to be run on every installation. Run it on the site and it can always be up to date and can be written in whatever language is easiest to maintain on the mother system. I would also like to make a pitch for a JavaScript-free tool. Just collect all the pertinent information, work it out and display the results in a second page. Some people just don't like JavaScript and turn it off even if we can run it in our browser. -- D'Arcy J.M. Cain [EMAIL PROTECTED] | Democracy is three wolves http://www.druid.net/darcy/| and a sheep voting on +1 416 425 1212 (DoD#0082)(eNTP) | what's for dinner. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PERFORM] PostgreSQL Configuration Tool for Dummies
Below is a link to the HTML JavaScript configuration page I am creating: http://www.webservices.uiuc.edu/postgresql/ I had many suggestions. Based on the feedback I received, I put together the initial list of questions. This list of questions can be change. Memory There are many different ways to ask about memory. Rather than ask a series of questions I went with a single question, #2. If it is better to ask about the memory in a series of questions then please give me the questions you would ask and why you would ask each of them. From my understanding the primary memory issue as it relates to PostgreSQL is how much memory is available to PostgreSQL. Remember that this needs to be as simple as possible. My next step is to list the PostgreSQL parameters found in the postgresql.conf file and how I will generate their values based on the questions I have so far. I will primarily focus on PostgreSQL 8.2.x. Once I have a consensus from everyone then I will put functionality behind the Generate Suggested Settings button. Thanks for all of the feedback, Lance Campbell Project Manager/Software Architect Web Services at Public Affairs University of Illinois 217.333.0382 http://webservices.uiuc.edu
[PERFORM] PostgreSQL Configuration Tool for Dummies
Now I am at the difficult part, what parameters to calculate and how to calculate them. Everything below has to do with PostgreSQL version 8.2: The parameters I would think we should calculate are: max_connections shared_buffers work_mem maintenance_work_mem effective_cache_size random_page_cost Any other variables? I am open to suggestions. Calculations based on values supplied in the questions at the top of the page: max_connection= question #3 or a minimum of 8 effective_cache_size={question #2}MB maintenance_work_mem= ({question #2} * .1) MB Any thoughts on the other variables based on the questions found at the top of the below web page? http://www.webservices.uiuc.edu/postgresql/ Thanks, Lance Campbell Project Manager/Software Architect Web Services at Public Affairs University of Illinois 217.333.0382 http://webservices.uiuc.edu
[PERFORM] Volunteer to build a configuration tool
I am a Java Software architect, DBA, and project manager for the University of Illinois, Department of Web Services. We use PostgreSQL to serve about 2 million pages of dynamic content a month; everything from calendars, surveys, forms, discussion boards, RSS feeds, etc. I am really impressed with this tool. The only major problem area I have found where PostgreSQL is really lacking is in what should my initial configuration settings be? I realize that there are many elements that can impact a DBA's specific database settings but it would be nice to have a configuration tool that would get someone up and running better in the beginning. This is my idea: A JavaScript HTML page that would have some basic questions at the top: 1) How much memory do you have? 2) How many connections will be made to the database? 3) What operating system do you use? 4) Etc... Next the person would press a button, generate, found below the questions. The JavaScript HTML page would then generate content for two Iframes at the bottom on the page. One Iframe would contain the contents of the postgresql.conf file. The postgresql.conf settings would be tailored more to the individuals needs than the standard default file. The second Iframe would contain the default settings one should consider using with their operating system. My web team would be very happy to develop this for the PostgreSQL project. It would have saved us a lot of time by having a configuration tool in the beginning. I am willing to make this a very high priority for my team. Thanks, Lance Campbell Project Manager/Software Architect Web Services at Public Affairs University of Illinois 217.333.0382 http://webservices.uiuc.edu
Re: [PERFORM] [DOCS] Volunteer to build a configuration tool
Mario, The JavaScript configuration tool I proposed would not be in the install of PostgreSQL. It would be an HTML page. It would be part of the HTML documentation or it could be a separate HTML page that would be linked from the HTML documentation. Thanks, Lance Campbell Project Manager/Software Architect Web Services at Public Affairs University of Illinois 217.333.0382 http://webservices.uiuc.edu -Original Message- From: Mario Gonzalez [mailto:[EMAIL PROTECTED] Sent: Monday, June 18, 2007 10:16 AM To: Campbell, Lance Cc: [EMAIL PROTECTED]; pgsql-performance@postgresql.org Subject: Re: [DOCS] Volunteer to build a configuration tool On 18/06/07, Campbell, Lance [EMAIL PROTECTED] wrote: Next the person would press a button, generate, found below the questions. The JavaScript HTML page would then generate content for two Iframes at the bottom on the page. One Iframe would contain the contents of the postgresql.conf file. The postgresql.conf settings would be tailored more to the individuals needs than the standard default file. The second Iframe would contain the default settings one should consider using with their operating system. I think it could be a great help to newbies. IMVHO a bash script in dialog could be better than a javascript file. There are many administrators with no graphics navigator or with no javascript. -- http://www.advogato.org/person/mgonzalez/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PERFORM] Recommended Initial Settings
I would like to get someone's recommendations on the best initial settings for a dedicated PostgreSQL server. I do realize that there are a lot of factors that influence how one should configure a database. I am just looking for a good starting point. Ideally I would like the database to reside as much as possible in memory with no disk access. The current database size of my 7.x version of PostgreSQL generates a 6 Gig file when doing a database dump. Dedicated PostgreSQL 8.2 Server Redhat Linux 4.x AS 64 bit version (EM64T) 4 Intel Xeon Processors 20 Gig Memory Current PostgreSQL database is 6 Gig file when doing a database dump /etc/sysctl.conf file settings: # 11 Gig kernel.shmmax = 11811160064 kernel.sem = 250 32000 100 128 net.ipv4.ip_local_port_range = 1024 65000 net.core.rmem_default = 262144 net.core.rmem_max = 262144 net.core.wmem_default = 262144 net.core.wmem_max = 262144 postgresql.conf file settings (if not listed then I used the defaults): max_connections = 300 shared_buffers = 10240MB work_mem = 10MB effective_cache_size = 512MB maintenance_work_mem = 100MB Any suggestions would be appreciated! Thanks, Lance Campbell Project Manager/Software Architect Web Services at Public Affairs University of Illinois 217.333.0382 http://webservices.uiuc.edu
Re: [PERFORM] Recommended Initial Settings
Richard, Thanks for your reply. You said: Your operating-system should be doing the caching for you. My understanding is that as long as Linux has memory available it will cache files. Then from your comment I get the impression that since Linux would be caching the data files for the postgres database it would be redundant to have a large shared_buffers. Did I understand you correctly? Thanks, Lance Campbell Project Manager/Software Architect Web Services at Public Affairs University of Illinois 217.333.0382 http://webservices.uiuc.edu -Original Message- From: Richard Huxton [mailto:[EMAIL PROTECTED] Sent: Friday, February 23, 2007 10:29 AM To: Campbell, Lance Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Recommended Initial Settings Campbell, Lance wrote: I would like to get someone's recommendations on the best initial settings for a dedicated PostgreSQL server. I do realize that there are a lot of factors that influence how one should configure a database. I am just looking for a good starting point. Ideally I would like the database to reside as much as possible in memory with no disk access. The current database size of my 7.x version of PostgreSQL generates a 6 Gig file when doing a database dump. Your operating-system should be doing the caching for you. Dedicated PostgreSQL 8.2 Server Redhat Linux 4.x AS 64 bit version (EM64T) 4 Intel Xeon Processors If these are older Xeons, check the mailing list archives for xeon context switch. 20 Gig Memory Current PostgreSQL database is 6 Gig file when doing a database dump OK, so it's plausible the whole thing will fit in RAM (as a rule-of-thumb I assume headers, indexes etc. triple or quadruple the size). To know better, check the actual disk-usage of $PGDATA. /etc/sysctl.conf file settings: # 11 Gig kernel.shmmax = 11811160064 Hmm - that's a lot of shared RAM. See shared_buffers below. kernel.sem = 250 32000 100 128 net.ipv4.ip_local_port_range = 1024 65000 net.core.rmem_default = 262144 net.core.rmem_max = 262144 net.core.wmem_default = 262144 net.core.wmem_max = 262144 postgresql.conf file settings (if not listed then I used the defaults): max_connections = 300 How many connections do you expect typically/peak? It doesn't cost much to have max_connections set high but your workload is the most important thing missing from your question. shared_buffers = 10240MB For 7.x that's probably way too big, but 8.x organises its buffers better. I'd still be tempted to start a 1 or 2GB and work up - see where it stops buying you an improvement. work_mem = 10MB If you have large queries, doing big sorts I'd increase this. Don't forget it's per-sort, so if you have got about 300 connections live at any one time that could be 300*10MB*N if they're all doing something complicated. If you only have one connection live, you can increase this quite substantially. effective_cache_size = 512MB This isn't setting PG's memory usage, it's telling PG how much data your operating-system is caching. Check free and see what it says. For you, I'd expect 10GB+. maintenance_work_mem = 100MB This is for admin-related tasks, so you could probably increase it. Workload workload workload - we need to know what you're doing with it. Once connection summarising the entire database will want larger numbers than 100 connections running many small queries. HTH -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] work-mem
I have been researching how to improve my overall performance of postgres. I am a little confused on the reasoning for how work-mem is used in the postgresql.conf file. The way I understand the documentation is you define with work-mem how much memory you want to allocate per search. Couldn't you run out of memory? This approach seems kind of odd to me. How do you tell the system not to allocate too much memory if you all of the sudden got hit with a heavier number of queries? Thanks, Lance Campbell Project Manager/Software Architect Web Services at Public Affairs University of Illinois 217.333.0382 http://webservices.uiuc.edu
[PERFORM] work-mem how do I identify the proper size
If I set work-mem at a particular amount of memory how do I answer the following questions: 1) How many of my queries were able to run inside the memory I allocated for work-mem? 2) How many of my queries had to run from disk because work-mem was not set high enough? 3) If a query had to go to disk in order to be sorted or completed is there a way to identify how much memory it would have taken in order to run the query from memory? Thanks for all of your help, Lance Campbell Project Manager/Software Architect Web Services at Public Affairs University of Illinois 217.333.0382 http://webservices.uiuc.edu