[PERFORM] Vacum Analyze problem
Hello everyone: I wanted to ask you about how the VACUUM ANALYZE works. is it possible that something can happen in order to reset its effects forcing to execute the VACUUM ANALYZE comand again? i am asking this because i am struggling with a query which works ok after i run a VACUUM ANALYZE, however, sudennly, it starts to take forever (the execution of the query) until i make another VACUUM ANALYZE, and so on ... I'd like to point that i am a novice when it comes to non basic postgresql performance related stuff. Thank you all in advance Rafael ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Vacum Analyze problem
On 9/4/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Hello everyone: I wanted to ask you about how the VACUUM ANALYZE works. is it possible that something can happen in order to reset its effects forcing to execute the VACUUM ANALYZE comand again? Yes, lots of modifications (INSERT,UPDATE,DELETE) to the table in question. Regards MP
Re: [PERFORM] Vacum Analyze problem
On 9/4/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Hello everyone: I wanted to ask you about how the VACUUM ANALYZE works. is it possible that something can happen in order to reset its effects forcing to execute the VACUUM ANALYZE comand again? Yes, lots of modifications (INSERT,UPDATE,DELETE) to the table in question. Regards MP I knew that in the long run the VACUUM ANALYZE comand has to be executed again. My question is if something can happen over night and cause the need of a new VACUUM ANALYZE (regenerating indexes or other thing related with performance). Thanks for your reply. Rafael ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Vacum Analyze problem
In response to [EMAIL PROTECTED]: Hello everyone: I wanted to ask you about how the VACUUM ANALYZE works. is it possible that something can happen in order to reset its effects forcing to execute the VACUUM ANALYZE comand again? i am asking this because i am struggling with a query which works ok after i run a VACUUM ANALYZE, however, sudennly, it starts to take forever (the execution of the query) until i make another VACUUM ANALYZE, and so on ... I'd like to point that i am a novice when it comes to non basic postgresql performance related stuff. Thank you all in advance To add to Mikko's comments: Periodic vacuuming and analyzing is a mandatory part of running a PostgreSQL database server. You'll probably be best served to configure the autovacuum daemon to handle this for you. See the postgresql.conf config file. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
[PERFORM] About autovacuum
I have this turned on, and if I look at the log, it runs once a minute, which is fine. But what does it do? I.e, it runs VACUUM, but does it also do an analyze? -- .~. Jean-David Beyer Registered Linux User 85642. /V\ PGP-Key: 9A2FC99A Registered Machine 241939. /( )\ Shrewsbury, New Jerseyhttp://counter.li.org ^^-^^ 14:35:01 up 26 days, 17:57, 2 users, load average: 4.31, 4.40, 4.78 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Vacum Analyze problem
On Tuesday 04 September 2007 11:27:07 [EMAIL PROTECTED] wrote: Hello everyone: I wanted to ask you about how the VACUUM ANALYZE works. is it possible that something can happen in order to reset its effects forcing to execute the VACUUM ANALYZE comand again? i am asking this because i am struggling with a query which works ok after i run a VACUUM ANALYZE, however, sudennly, it starts to take forever (the execution of the query) until i make another VACUUM ANALYZE, and so on ... I'd like to point that i am a novice when it comes to non basic postgresql performance related stuff. Thank you all in advance Rafael ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly Rafael; Vacuum Analyze performs 2 tasks at once. 1) Vacuum - this analyzes the table pages and sets appropriate dead row space (those from old updates or deletes that are not possibly needed by any existing transactions) as such that the db can re-use (over-write) that space. 2) Analyze - Like an Oracle compute stats, updates the system catalogs with current table stat data. The Vacuum will improve queries since the dead space can be re-used and any dead space if the table you are having issues with is a high volume table then the solution is generally to run vacuum more often - I've seen tables that needed a vacuum every 5 minutes due to significant sustained churn. The Analyze of course is key for the planner, if the table is growing rapidly then running analyze more often will help, if however there is lots of churn but little change in the data (i.e. lots of inserts followed by delete's of the same rows) then a straight vacuum is probably what you need. If the data is changing rapidly then bumping up the default_statistics_target value may help - you can bump the default_statistics_target for a single table in the pg_autovacuum system catalog table. Hope this helps... /Kevin ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq Thank you all for the information. I'll get to work on it and see what happends. Thanks again Rafael ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Vacum Analyze problem
--- [EMAIL PROTECTED] wrote: Thank you all for the information. I'll get to work on it and see what happends. Thanks again Rafael I'll chime in with one last thought about excellent resources on Vacuum: http://www.postgresql.org/docs/8.2/static/sql-vacuum.html http://www.postgresql.org/docs/8.2/static/sql-analyze.html http://www.postgresql.org/docs/8.2/static/routine-vacuuming.html Regards, Richard Broersma Jr. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] About autovacuum
In response to Jean-David Beyer [EMAIL PROTECTED]: I have this turned on, and if I look at the log, it runs once a minute, which is fine. But what does it do? I.e, it runs VACUUM, but does it also do an analyze? Yes. If you turn up the debugging level, you'll see detailed log messages about its activities. There were discussions on other lists about improving autovacuum's log messages, I'm pretty sure it will log more helpful information in 8.3. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] join tables vs. denormalization by trigger
On Tue, 2007-09-04 at 20:53 +0200, Walter Mauritz wrote: Hi, I wonder about differences in performance between two scenarios: Background: Table A, ~50,000 records Table B, ~3,000,000 records (~20 cols) Table C, ~30,000,000 records (~10 cols) a query every 3sec. with limit 10 Table C depends on Table B wich depends on Table A, int8 foreign key, btree index * consider it a read only scenario (load data only in night, with time for vacuum analyze daily) * im required to show records from Table C, but also with some (~5cols) info from Table B * where clause always contains the foreign key to Table A * where clause may contain further 1-10 search parameter Scenario A) simply inner join Table B + C Scenario B) with use of trigger on insert/update I could push the required information from table B down to table C. - so i would only require to select from table C. My question: 1) From your experience ... how much faster (approximately) in percent do you regard Scenario B faster than A ? You're assuming that B is always going to be faster than A, which certainly isn't a foregone conclusion. Let's say that you average 10 bytes per column. In scenario A, the total data size is then roughly 3,000,000 * 20 * 10 + 30,000,000 * 10 * 10 = 3.6 GiB. In scenario B due to your denormalization, the total data size is more like 30,000,000 * 30 * 10 = 9 GiB, or 2.5 times more raw data. That's a lot of extra disk IO, unless your database will always fit in memory in both scenarios. Although you didn't provide enough data to answer with certainty, I would go on the assumption that A is going to be faster than B. But even if it weren't, remember that premature optimization is the root of all evil. If you try A and it doesn't perform fast enough, then you can always try B later to see if it works any better. -- Mark Lewis ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] Performance on 8CPU's and 32GB of RAM
A client is moving their postgresql db to a brand new Windows 2003 x64 server with 2 quad cores and 32GB of RAM. It is a dedicated server to run 8.2.4. The server typically will have less than 10 users. The primary use of this server is to host a database that is continuously being updated by data consolidation and matching software software that hits the server very hard. There are typically eight such processes running at any one time. The software extensively exploits postgresql native fuzzy string for data matching. The SQL is dynamically generated by the software and consists of large, complex joins. (the structure of the joins change as the software adapts its matching strategies). I would like to favour the needs of the data matching software, and the server is almost exclusivly dedicated to PostgreSQL. I have made some tentative modifications to the default postgres.config file (see below), but I don't think I've scratched the surface of what this new system is capable of. Can I ask - given my client's needs and this new, powerful server and the fact that the server typically has a small number of extremely busy processes, what numbers they would change, and what the recommendations would be? Thanks! Carlo max_connections = 100 shared_buffers = 10 work_mem = 100 max_fsm_pages = 204800 max_fsm_relations = 1500 vacuum_cost_delay = 40 bgwriter_lru_maxpages = 100 bgwriter_all_maxpages = 100 checkpoint_segments = 64 checkpoint_warning = 290 effective_cache_size = 375000 stats_command_string = on stats_start_collector = on stats_row_level = on autovacuum = on autovacuum_naptime = 1min autovacuum_vacuum_threshold = 500 autovacuum_analyze_threshold = 250 autovacuum_vacuum_scale_factor = 0.2 autovacuum_analyze_scale_factor = 0.1 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] join tables vs. denormalization by trigger
Hello, I had a similar issue and -atfer testing - decided to merge the tables B and C into a single table. In my case the resulting table contains a large proportion of nulls which limits the size increase... You'll have to do some testing with your data to evaluate the performance gain. Hope to help, Marc -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Walter Mauritz Sent: Tuesday, September 04, 2007 8:53 PM To: pgsql-performance@postgresql.org Subject: [PERFORM] join tables vs. denormalization by trigger Hi, I wonder about differences in performance between two scenarios: Background: Table A, ~50,000 records Table B, ~3,000,000 records (~20 cols) Table C, ~30,000,000 records (~10 cols) a query every 3sec. with limit 10 Table C depends on Table B wich depends on Table A, int8 foreign key, btree index * consider it a read only scenario (load data only in night, with time for vacuum analyze daily) * im required to show records from Table C, but also with some (~5cols) info from Table B * where clause always contains the foreign key to Table A * where clause may contain further 1-10 search parameter Scenario A) simply inner join Table B + C Scenario B) with use of trigger on insert/update I could push the required information from table B down to table C. - so i would only require to select from table C. My question: 1) From your experience ... how much faster (approximately) in percent do you regard Scenario B faster than A ? 2) any other tips for such a read only scenario Thx for any attention :-) Walter -- GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS. Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Performance on 8CPU's and 32GB of RAM
On 9/4/07, Carlo Stonebanks [EMAIL PROTECTED] wrote: A client is moving their postgresql db to a brand new Windows 2003 x64 server with 2 quad cores and 32GB of RAM. It is a dedicated server to run 8.2.4. And what does the drive subsystem look like? All that horsepower isn't going to help if all your data is sitting on an inferior drive subsystem. The server typically will have less than 10 users. The primary use of this server is to host a database that is continuously being updated by data consolidation and matching software software that hits the server very hard. There are typically eight such processes running at any one time. The software extensively exploits postgresql native fuzzy string for data matching. The SQL is dynamically generated by the software and consists of large, complex joins. (the structure of the joins change as the software adapts its matching strategies). I would like to favour the needs of the data matching software, and the server is almost exclusivly dedicated to PostgreSQL. I have made some tentative modifications to the default postgres.config file (see below), but I don't think I've scratched the surface of what this new system is capable of. Can I ask - given my client's needs and this new, powerful server and the fact that the server typically has a small number of extremely busy processes, what numbers they would change, and what the recommendations would be? Thanks! Carlo max_connections = 100 shared_buffers = 10 work_mem = 100 Even with only 10 users, 1 gig work_mem is extremely high. (without a unit, work_mem is set in k on 8.2.x) 1 would be much more reasonable. OTOH, shared_buffers, at 10 is only setting it to 100 meg. that's pretty small on a machine with 32 gig. Also, I recommend setting values more readable, like 500MB in postgresql.conf. Much easier to read than 10... effective_cache_size = 375000 This seems low by an order of magnitude or two. But the most important thing is what you've left out. What kind of I/O does this machine have. It's really important for something that sounds like an OLAP server. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Performance on 8CPU's and 32GB of RAM
Carlo Stonebanks wrote: A client is moving their postgresql db to a brand new Windows 2003 x64 server with 2 quad cores and 32GB of RAM. It is a dedicated server to run 8.2.4. Large shared_buffers and Windows do not mix. Perhaps you should leave the shmem config low, so that the kernel can cache the file pages. The server typically will have less than 10 users. The primary use of this server is to host a database that is continuously being updated by data consolidation and matching software software that hits the server very hard. There are typically eight such processes running at any one time. The software extensively exploits postgresql native fuzzy string for data matching. The SQL is dynamically generated by the software and consists of large, complex joins. (the structure of the joins change as the software adapts its matching strategies). It sounds like you will need a huge lot of vacuuming effort to keep up. Maybe you should lower autovac scale factors so that your tables are visited more frequently. A vacuum_delay of 40 sounds like too much though. Since you didn't describe your disk configuration, it is most likely not really prepared to handle high I/O load. Maybe you should fix that. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Performance on 8CPU's and 32GB of RAM
On 9/4/07, Alvaro Herrera [EMAIL PROTECTED] wrote: Carlo Stonebanks wrote: A client is moving their postgresql db to a brand new Windows 2003 x64 server with 2 quad cores and 32GB of RAM. It is a dedicated server to run 8.2.4. Large shared_buffers and Windows do not mix. Perhaps you should leave the shmem config low, so that the kernel can cache the file pages. Egads, I'd completely missed the word Windows up there. I would highly recommend building the postgresql server on a unixish OS. Even with minimum tuning, I'd expect the same box running linux or freebsd to stomp windows pretty heavily in the performance department. But yeah, the I/O, that's the big one. If it's just a single or a couple of IDE drives, it's not gonna be able to handle much load. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Vacum Analyze problem
[EMAIL PROTECTED] writes: I knew that in the long run the VACUUM ANALYZE comand has to be executed again. My question is if something can happen over night and cause the need of a new VACUUM ANALYZE (regenerating indexes or other thing related with performance). The answer to your question is possibly yes for two reasons: 1) If you're running an autovacuum daemon it might decide it's time to vacuum the table and kick off a vacuum. That sounds most like what you're describing. 2) If the size of the table changes substantially becoming much larger (or smaller but that wouldn't happen just due to deletes unless you run vacuum) then recent versions of Postgres will notice even if you don't run analyze and take that into account. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend