Re: [PERFORM] Postgresql Hardware - Recommendations
On 9/5/05 6:50 AM, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: The questions we are asking us now are: 1) Intel or AMD (or alternate Platform) Are we better of with Xeons or Opterons? Should we consider the IBM OpenPower platform? Opteron spanks Xeon for database loads. Advantage AMD, and you generally won't have to spend much extra money for the privilege. I've never used Postgres on the IBM OpenPower platform, but I would expect that it would perform quite well, certainly better than the Xeons and probably competitive with the Opterons in many respects -- I am not sufficiently knowledgeable to make a definitive recommendation. 2) CPUs vs cache Would you rather have more CPUs or more cache? Eg: 4x Xeon 1MB vs 2x Xeon 8MB I would expect that cache sizes are relatively unimportant compared to number of processors, but it would depend on the specifics of your load. Cache coherence is a significant issue for high concurrency database applications, and a few megabytes of cache here and there will likely make little difference for a 60GB database. Databases spend most of their time playing in main memory, not in cache. The biggest advantage I can see to bigger cache would be connection scaling, in which case you'll probably buy more mileage with more processors. There are a lot of architecture dependencies here. Xeons scale badly to 4 processors, Opterons scale just fine. 3) CPUs vs Memory Would you rather have 4x CPUs and 8GB of memory, or 2x CPUs with 16GB of memory? Uh, for what purpose? CPU and memory are not fungible, so how you distribute them depends very much on your application. You can never have too much memory for a large database, but having extra processors on a scalable architecture is pretty nice too. What they both buy you is not really related. The amount of memory you need is determined by the size of your cache-able working set and the nature of your queries. Spend whatever money is left on the processors; if your database spends all its time waiting for disks, no quantity of processors will help you unless you are doing a lot of math on the results. YMMV, as always. Recommendations more specific than Opterons rule, Xeons suck depend greatly on what you plan on doing with the database. Cheers, J. Andrew Rogers ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Improving performance of a query
Carlos wrote: SELECT * FROM SSIRRA where (YEAR = 2004 and CUSTOMER = 04 and CODE = 00 and PART = 00) or (YEAR = 2004 and CUSTOMER = 04 and CODE 00) or (YEAR = 2004 and CUSTOMER 04) or (YEAR 2004) [snip] ah, the positional query. You can always rewrite this query in the following form: (YEAR = 2004) and (YEAR = 2004 or CUSTOMER = 04) and (YEAR = 2004 or CUSTOMER = 04 or CODE = 00) and (YEAR = 2004 or CUSTOMER = 04 or CODE = 00 or PART 00) This is better because it will index scan using 'year' (not customer or part though). The true answer is to lobby for/develop proper row constructor support so you can just SELECT * FROM SSIRRA where (YEAR, CUSTOMER, CODE, PART) (2004, 04, 00, 00) this is designed to do what you are trying to do but currently doesn't work quite right. note: in all these queries, 'order by YEAR, CUSTOMER, CODE, PART' should probably be on the query. Other solution: use cursor/fetch or some type of materialized solution. Merlin ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] When to do a vacuum for highly active table
[EMAIL PROTECTED] (Rigmor Ukuhe) writes: -Original Message- From: [EMAIL PROTECTED] [mailto:pgsql-performance- [EMAIL PROTECTED] On Behalf Of Markus Benne Sent: Wednesday, August 31, 2005 12:14 AM To: pgsql-performance@postgresql.org Subject: [PERFORM] When to do a vacuum for highly active table We have a highly active table that has virtually all entries updated every 5 minutes. Typical size of the table is 50,000 entries, and entries have grown fat. We are currently vaccuming hourly, and towards the end of the hour we are seeing degradation, when compared to the top of the hour. Vaccum is slowly killing our system, as it is starting to take up to 10 minutes, and load at the time of vacuum is 6+ on a Linux box. During the vacuum, overall system is goin unresponsive, then comes back once vacuum completes. Play with vacuum_cost_delay option. In our case it made BIG difference (going from very heavy hitting to almost unnoticed vacuuming.) That helps only if the ONLY problem you're having is from the direct I/O of the vacuum. If part of the problem is that the table is so large that it takes 4h for VACUUM to complete, thereby leaving a transaction open for 4h, thereby causing other degradations, then vacuum_cost_delay will have a NEGATIVE impact, as it will mean that the vacuum on that table will take even /more/ than 4h. :-( For the above scenario, it is almost certain that the solution comes in two pieces: 1. VACUUM FULL / CLUSTER to bring the size down. The table has grown fat, and no number of repetitions of plain vacuum will fix this. 2. Do plain vacuum on the table VASTLY more frequently, probably every 5 minutes, possibly more often than that. By doing this, you prevent things from getting so bad again. By the way, in this sort of situation, _ANY_ transaction that runs more than about 5 minutes represents a serious enemy to performance, as it will tend to cause the hot table to get fatter. -- (reverse (concatenate 'string gro.gultn @ enworbbc)) http://www.ntlug.org/~cbbrowne/linux.html TECO Madness: a moment of regret, a lifetime of convenience. -- Kent Pitman ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Improving performance of a query
On Tue, 6 Sep 2005, Merlin Moncure wrote: Carlos wrote: SELECT * FROM SSIRRA where (YEAR = 2004 and CUSTOMER = 04 and CODE = 00 and PART = 00) or (YEAR = 2004 and CUSTOMER = 04 and CODE 00) or (YEAR = 2004 and CUSTOMER 04) or (YEAR 2004) [snip] ah, the positional query. You can always rewrite this query in the following form: (YEAR = 2004) and (YEAR = 2004 or CUSTOMER = 04) and (YEAR = 2004 or CUSTOMER = 04 or CODE = 00) and (YEAR = 2004 or CUSTOMER = 04 or CODE = 00 or PART 00) Unless I'm not seeing something, I don't think that's a correct reformulation in general. If customer 4 and year 2004 the original clause would return true but the reformulation would return false since (year=2004 or customer = 4) would be false. ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] insert performance for win32
Hi, I usually use PostgreSQL coupled with Linux, but I have to use Windows for a perticular project. So I wanted to do some tests to know if the performance will be acceptable (I don't need PostgreSQL to be as fast with windows as with linux, but it has to be usable...). I started with trying to do lots of inserts, and I'm quite astonished by the catastrophics results ... The test : The computer was the same (my workstation, a PIV Dell with SATA disk), dual boot The windows OS is XP. Both Oses are PostgreSQL 8.0.3 Both PostgreSQL clusters (windows and linux) have the same tuning (shared_buffers=2, wal_buffers=128, checkpoint_segments=10) Before each test, the clusters are vacuum analyzed, and the test database is recreated. The script is quite dumb : BEGIN; CREATE TABLE test (col1 serial, col2 text); INSERT INTO test (col2) values ('test'); INSERT INTO test (col2) values ('test'); INSERT INTO test (col2) values ('test'); INSERT INTO test (col2) values ('test'); INSERT INTO test (col2) values ('test'); .. 500,000 times Then COMMIT. I know it isn't realistic, but I needed to start with something :) The results are as follows : Linux : 1'9'' Windows : 9'38'' What I've tried to solve, and didn't work : - Deactivate antivirus on windows - fsync=no - raise the checkpoint_segments value (32) - remove hyperthreading (who knows...) I don't know what could cause this (I'm not a windows admin...at all). All I see is a very high kernel load during the execution of this script, but I can't determine where it comes from. I'd like to know if this is a know problem, if there is something I can do, etc... Thanks a lot. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Improving performance of a query
Carlos wrote: SELECT * FROM SSIRRA where (YEAR = 2004 and CUSTOMER = 04 and CODE = 00 and PART = 00) or (YEAR = 2004 and CUSTOMER = 04 and CODE 00) or (YEAR = 2004 and CUSTOMER 04) or (YEAR 2004) [snip] ah, the positional query. You can always rewrite this query in the following form: (YEAR = 2004) and (YEAR = 2004 or CUSTOMER = 04) and (YEAR = 2004 or CUSTOMER = 04 or CODE = 00) and (YEAR = 2004 or CUSTOMER = 04 or CODE = 00 or PART 00) Unless I'm not seeing something, I don't think that's a correct reformulation in general. If customer 4 and year 2004 the original clause would return true but the reformulation would return false since (year=2004 or customer = 4) would be false. You are correct, you also have to exchange '=' with '' to exchange 'and' with 'or'. Correct answer is: (YEAR = 2004) and (YEAR 2004 or CUSTOMER = 04) and (YEAR 2004 or CUSTOMER 04 or CODE = 00) and (YEAR 2004 or CUSTOMER 04 or CODE 00 or PART 00) It's easy to get tripped up here: the basic problem is how to get the next record based on a multi part key. My ISAM bridge can write them either way but the 'and' major form is always faster ;). MErlin ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] insert performance for win32
Hi, I usually use PostgreSQL coupled with Linux, but I have to use Windows for a perticular project. So I wanted to do some tests to know if the performance will be acceptable (I don't need PostgreSQL to be as fast with windows as with linux, but it has to be usable...). In my experience win32 is par with linux generally with a few gotchas on either side. Are your times with fsync=no? It's much harder to give apples-apples comparison with fsync=on for various reasons. Are you running stats_command_string=on? Try disabling and compare results. Is your loading app running locally or on the server? I am very interesting in discovering sources of high cpu load problems on win32. If you are still having problems could you get a gprof profile together? There is a recent thread on win32-hackers discussing how to do this. Merlin ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] Poor performance of delete by primary key
Title: Poor performance of delete by primary key Hello, We are seeing a very strange behavior from postgres. For one of our very common tasks we have to delete records from a table of around 500,000 rows. The delete is by id which is the primary key. It seems to be consistently taking around 10 minutes to preform. This is totally out of line with the rest of the performance of the database. This table does have children through foreign keys, but I am pretty sure that all foreign key constraints in the schema have indexes on their children. Sometimes if we do a vacuum right before running the process the delete will go much faster. But then the next time we run the task, even just a few minutes later, the delete takes a long time to run. We deploy the same application also on Oracle. The schemas are pretty much identical. On similar hardware with actually about 4 to 5 times the data, Oracle does not seem to have the same problem. Not that that really means anything since the internals of Oracle and PostgreSQL are so different, but an interesting fact anyway. Any ideas on what might be going on? Thanks, B.
Re: [PERFORM] Postgresql Hardware - Recommendations
On 9/5/05, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: ... The only info I can give so far is that the database size is about 60GB, and that it will be frequently accessed by multiple users (about 100 will be connected during business hours). The applications accessing the database are mostly reporting tools. Optimizing hardware for mostly selects is different than optimizing for lots of inserts. You will get good responses from this list if you can give a little more details. Here are some questions: How do you get your data into the db? Do you do bullk loads at periodic intervals during the day? Do you do frequent updates/inserts? You say reporting, do you use many stored procedures and calculations on the server side? I've used some reporting apps that simply grab tons of data from the server and then process it on the client side (ODBC apps seem to do this), while other applications formulate the queries and use stored procedures in order to transfer little data. Of your 60GB, how much of that is active? Does your budget allow you to buy enough RAM to get your active data into the disk cache? For reporting, this *might* be your biggest win. Here are some scenarios: S1: Bulk uploads once or twice daily of about 250 MB of data. Few inserts and updates during the day (1-2%). Reporting is largely done on data from the last 5 business days. In this case you have 2GB of active data and your disk cache will hold all of your active data in RAM (provided your db structure is diskcache friendly). An example of this I have experienced is a sales application that queries current inventory. Telephone agents queried, quieried, queried the instock-inventory. S2: Same as above but reporting is largely done on data covering 200+ business days. Its doubtful that you will get 50GB of RAM in your server, you need to focus on disk speed. An example of this I have experienced was an application that looked at sales trends and performed commission calculations and projected sales forecasts. S3: Lots of inserts/updates throughout the day (15 - 25%) - you need to focus on disk speed. The content management system my employer develops fits this model. 3) CPUs vs Memory Would you rather have 4x CPUs and 8GB of memory, or 2x CPUs with 16GB of memory? Very hard to say without knowing your application. I have limited experience but what I've found is that applications that support multiple db architectures do not fully utilize the database server and CPU utilization is low. Disk and network i/o is high. I don't know if your application supports multiple backeneds, but chances are good your biggest wins will come from RAM, disk and network investments. -- Matthew Nuzum www.bearfruit.org ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] insert performance for win32
In my experience win32 is par with linux generally with a few gotchas on either side. Are your times with fsync=no? It's much harder to give apples-apples comparison with fsync=on for various reasons. It is with fsync=off on windows, fsync=on on linux Are you running stats_command_string=on? Try disabling and compare results. Deactivated on windows, activated on linux Is your loading app running locally or on the server? Yes I am very interesting in discovering sources of high cpu load problems on win32. If you are still having problems could you get a gprof profile together? There is a recent thread on win32-hackers discussing how to do this. I'll give it a look Merlin ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Poor performance of delete by primary key
Brian Choate [EMAIL PROTECTED] writes: We are seeing a very strange behavior from postgres. For one of our very = common tasks we have to delete records from a table of around 500,000 = rows. The delete is by id which is the primary key. It seems to be = consistently taking around 10 minutes to preform. This is totally out of = line with the rest of the performance of the database. I'll bet this table has foreign-key references from elsewhere, and the referencing columns are either not indexed, or not of the same datatype as the master column. Unfortunately there's no very simple way to determine which FK is the problem. (In 8.1 it'll be possible to do that with EXPLAIN ANALYZE, but in existing releases EXPLAIN doesn't break out the time spent in each trigger ...) You have to just eyeball the schema :-(. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Poor performance of delete by primary key
On Tue, Sep 06, 2005 at 11:32:00AM -0400, Tom Lane wrote: Brian Choate [EMAIL PROTECTED] writes: We are seeing a very strange behavior from postgres. For one of our very = common tasks we have to delete records from a table of around 500,000 = rows. The delete is by id which is the primary key. It seems to be = consistently taking around 10 minutes to preform. This is totally out of = line with the rest of the performance of the database. I'll bet this table has foreign-key references from elsewhere, and the referencing columns are either not indexed, or not of the same datatype as the master column. Wouldn't setting the FK as deferrable and initially deferred help here too as then the FK wouldn't be checked until the transaction ended? Matthew ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Massive performance issues
Hi, Many thanks for all your thoughts and advice. With just 2GB or RAM, no change to the harddisc (still SATA) but proper tuning of Postgresql (still 7.4) and aggressive normalization to shrink row width, I have managed to get suitable performance, with, when fully cached, queries on a 5 million row data set, including queries such as: select to_char(sale_date, 'DD Mon ') as sale_date_text, cost, property_types.type as property_type, sale_types.type as sale_type, flat_extra, number, street, loc1.component as locality_1, loc2.component as locality_2, city.component as city, county.component as county, postcode from address inner join ( select id from address_components where component = 'Woodborough' ) as t1 on locality_1_id = t1.id or locality_2_id = t1.id or city_id = t1.id inner join ( select id, street from streets where street = 'Lowdham Lane' ) as t2 on street_id = t2.id inner join sale_types on sale_types.id = sale_type_id inner join property_types on property_types.id = property_type_id inner join address_components as county on county_id = county.id inner join address_components as city on city_id = city.id inner join address_components as loc2 on locality_2_id = loc2.id inner join address_components as loc1 on locality_1_id = loc1.id order by sale_date desc limit 11 offset 0 completing within 50ms. I've also now managed to request that the full production system will have 4GB of RAM (there are still a few queries that don't quite fit in 2GB of RAM) and a 15kRPM SCSI HD. So once again, thanks for all your help. I've literally been pulling my hair out over this so it's great to have basically got it solved. Matthew ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Poor performance of delete by primary key
Brian Choate wrote: Hello, We are seeing a very strange behavior from postgres. For one of our very common tasks we have to delete records from a table of around 500,000 rows. The delete is by id which is the primary key. It seems to be consistently taking around 10 minutes to preform. This is totally out of line with the rest of the performance of the database. Any ideas on what might be going on? Well, it sounds like *something* isn't using an index. You say that all your FK's are indexed, but that's something worth checking. Also keep an eye out for type conflicts. If the system is otherwise idle, it might be worthwhile to compare before and after values of pg_stat* (user-tables and user-indexes). -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] insert performance for win32
In my experience win32 is par with linux generally with a few gotchas on either side. Are your times with fsync=no? It's much harder to give apples-apples comparison with fsync=on for various reasons. It is with fsync=off on windows, fsync=on on linux well, inside a transaction this shouldn't have mattered anyways. Are you running stats_command_string=on? Try disabling and compare results. Deactivated on windows, activated on linux Is your loading app running locally or on the server? Yes hm :(. Well, you had me curious so I went ahead and re-ran your test case and profiled it (on windows). I got similar results time wise. It's interesting to note that the command I used to generate the test table before dumping w/inserts insert into test select nextval('test_id_seq'), 'test' from generate_series(1,50) ran in just a few seconds. Well, I cut the #recs down to 50k and here is profile trace: % cumulative self self total time seconds secondscalls s/call s/call name 10.78 0.62 0.6250001 0.00 0.00 yyparse 5.39 0.93 0.31 5101422 0.00 0.00 AllocSetAlloc 4.52 1.19 0.26 799970 0.00 0.00 base_yylex 2.78 1.35 0.16 28 0.00 0.00 SearchCatCache 2.43 1.49 0.14 554245 0.00 0.00 hash_search 2.26 1.62 0.1349998 0.00 0.00 XLogInsert 1.74 1.72 0.10 453363 0.00 0.00 LWLockAcquire 1.74 1.82 0.10 299988 0.00 0.00 ScanKeywordLookup This makes me wonder if we are looking in the wrong place. Maybe the problem is coming from psql? More results to follow. Merlin ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Poor performance of delete by primary key
I had a similar problem, so I downloaded 8.1 from CVS, ran it on a relatively gnarly dev workstation, imported a dump of my 8.0 database, and ran my troublesome queries with the new EXPLAIN ANALYZE. This process took about an hour and worked great, provided that you've actually named your foreign key constraints. Otherwise, you'll find out that there's a trigger for a constraint called $3 that's taking up all of your time, but you won't know what table that constraint is on. -- Mark On Tue, 2005-09-06 at 11:32 -0400, Tom Lane wrote: Brian Choate [EMAIL PROTECTED] writes: We are seeing a very strange behavior from postgres. For one of our very = common tasks we have to delete records from a table of around 500,000 = rows. The delete is by id which is the primary key. It seems to be = consistently taking around 10 minutes to preform. This is totally out of = line with the rest of the performance of the database. I'll bet this table has foreign-key references from elsewhere, and the referencing columns are either not indexed, or not of the same datatype as the master column. Unfortunately there's no very simple way to determine which FK is the problem. (In 8.1 it'll be possible to do that with EXPLAIN ANALYZE, but in existing releases EXPLAIN doesn't break out the time spent in each trigger ...) You have to just eyeball the schema :-(. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] insert performance for win32
This makes me wonder if we are looking in the wrong place. Maybe the problem is coming from psql? More results to follow. problem is not coming from psql. One thing I did notice that in a 250k insert transaction the insert time grows with #recs inserted. Time to insert first 50k recs is about 27 sec and last 50 k recs is 77 sec. I also confimed that size of table is not playing a role here. Marc, can you do select timeofday() every 50k recs from linux? Also a gprof trace from linux would be helpful. Merlin ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Poor performance of delete by primary key
Mark Lewis [EMAIL PROTECTED] writes: I had a similar problem, so I downloaded 8.1 from CVS, ran it on a relatively gnarly dev workstation, imported a dump of my 8.0 database, and ran my troublesome queries with the new EXPLAIN ANALYZE. This process took about an hour and worked great, provided that you've actually named your foreign key constraints. Otherwise, you'll find out that there's a trigger for a constraint called $3 that's taking up all of your time, but you won't know what table that constraint is on. But at least you've got something you can work with. Once you know the name of the problem trigger you can look in pg_trigger to see which other table it's connected to. Try something like select tgname, tgconstrrelid::regclass, tgargs from pg_trigger where tgrelid = 'mytable'::regclass; regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[PERFORM] log_duration times
Hi, We want to discover how to improve the performance ofan application and with that intention I turned on log_duration, log_statement=alland the time stamp escape character (%t) of log_line_prefix. Subtracting the time stamp of the last SQL statement from the first one I disovered that the whole application takes about 10 seconds to run. Almost the same time we have at the client workstation. Adding all the log_duration times I found almost 3 seconds (30% of the total time). So, I realized that to improve performance it will be better to discover who is spending the 7 remaining seconds than making changes in database structure or SQL syntax. How could I discover who is usingthe 7 remaining seconds? Network? ODBC? Application? Thanks in advance! Reimer __Converse com seus amigos em tempo real com o Yahoo! Messenger http://br.download.yahoo.com/messenger/
Re: [PERFORM] poor VACUUM performance on large tables
Thomas F. O'Connell: Do you have your Free Space Map settings configured appropriately? Our current FSM settings are: max_fsm_pages = 50 # min max_fsm_relations*16, 6 bytes each max_fsm_relations = 1000# min 100, ~50 bytes each You'll want to run a VACUUM VERBOSE and note the numbers at the end, which describe how many pages are used and how many are needed. max_fsm_pages should be set according to that, and you can set max_fsm_relations based on it, too, although typically one knows roughly how many relations are in a database. Here are the last two lines from a VACUUM VERBOSE FULL we did when the database was totally full: INFO: free space map: 82 relations, 532349 pages stored; 632352 total pages needed DETAIL: Allocated FSM size: 1000 relations + 50 pages = 2995 kB shared memory. VACUUM Based on this, it looks like we could stand to bump up our FSM another couple hundred thousand. Does it buy us anything to reduce the number of FSM relations from the default of 1000? have you experimented with pg_autovacuum We're not using pg_autovacuum. We have our own mechanism that works basically the same as pg_autovacuum, but split into two separate threads, one for large tables and one for small tables. We consider tables to be large if their size exceeds 100MB. Tables are selected for vacuuming if they've changed enough (I can get you actual metrics for what is enough, but I don't know off the top of my head). Our main reason for splitting out small vs. large tables was that the large tables take a long time to VACUUM and we didn't want our small tables to go a long time between VACUUMs. Of course, due to the transactional nature of VACUUM, we aren't really gaining much here, anyway (this was one of the things we were hoping to address with Hannu's patch, but there are issues with his patch on 8.0.2 that we haven't tracked down yet). Tom Lane: Which PG version, exactly? We're currently running 8.0.2. Plain VACUUM (not FULL) certainly ought not take that long. (If you're using VACUUM FULL, the answer is going to be don't do that.) Heh, we're definitely not doing a VACUUM FULL. We're doing VACUUM ANALYZE {tablename} exclusively, except when we get close to the transaction id wraparound threshold when we do a VACUUM ANALYZE of the entire database. What maintenance_work_mem (or vacuum_mem in older releases) are you running it under? It looks like we are using the defaults for work_mem (1024) and maintenance_work_mem (16384). We could certainly bump these up. Is there a good way to determine what settings would be reasonable? I'll note, however, that we had experimented with bumping these previously and not noticed any change in performance. Can you get VACUUM VERBOSE output from some of these cases so we can see which phase(s) are eating the time? I'll get some, but it will take a few more days as we have recently reset our test environment. I can get some sample runs of VACUUM VERBOSE on pg_largeobject in a few hours (it takes a few hours to run) and will post them when I have them. It'd also be interesting to watch the output of vmstat or local equivalent --- it might just be that your I/O capability is nearly saturated and VACUUM is pushing the system over the knee of the response curve. If so, the vacuum delay options of 8.0 would be worth experimenting with. We've been monitoring I/O rates with iostat and we're generally running around 90% I/O usage after we kick into the rolling delete stage (before we reach that stage, we're running around 20%-50% I/O usage). We are definitely I/O bound, hence trying to find a way to make VACUUM process less data. Our system (the database is on an appliance system) is a dual CPU box, and we're burning about 25% of our CPU time in I/O waits (again, after our rolling delete kicks in). A higher performance I/O subsystem is something we could try. Our biggest concern with increasing the vacuum delay options is the length of time it currently takes to VACUUM our large tables (and pg_largeobject). Holding a transaction open for these long periods degrades performance in other places. Statistics are also skewed during this process and we have observed the planner choosing sequential scans on tables where it is obvious that an index scan would be more efficient. That's really pretty hard to believe; VACUUM doesn't affect the statistics until the very end. Can you give some specifics of how the statistics are skewed? I don't have any hard evidence for this, but we have noticed that at certain times a particular query which we run will run for an extremely long time (several hours). Re-running the query with EXPLAIN always shows it using an index scan and it runs very quickly. We haven't been able to catch it with an EXPLAIN in the state where it will take a long time (it's not deterministic). Our assumption is that the planner is taking the wrong path because we can't figure out
Re: [PERFORM] poor VACUUM performance on large tables
Jan Peterson [EMAIL PROTECTED] writes: Based on this, it looks like we could stand to bump up our FSM another couple hundred thousand. Does it buy us anything to reduce the number of FSM relations from the default of 1000? Not a lot; as the comment says, those slots are only about 50 bytes each. (I think the true figure is closer to 70, according to some measurements I did recently on CVS tip, but in any case it's less than 100 bytes apiece.) Still, a byte saved is a byte earned ... It looks like we are using the defaults for work_mem (1024) and maintenance_work_mem (16384). We could certainly bump these up. Is there a good way to determine what settings would be reasonable? I'd bump up maintenance_work_mem by a factor of 10 and see if it makes a difference. It should reduce the number of passes over the indexes when vacuuming up lots of deleted rows. If you have lots of RAM you might be able to increase it more, but try that for starters. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Poor performance of delete by primary key
Unfortunately there's no very simple way to determine which FK is the problem. (In 8.1 it'll be possible to do that with EXPLAIN ANALYZE, but in existing releases EXPLAIN doesn't break out the time spent in each trigger ...) You have to just eyeball the schema :-(. phpPgAdmin has a handy info feature where you can see all tables that refer to the current one. You can always go and steal that query to find them... Chris ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match