Re: [PERFORM] database tuning
kelvan wrote: hi i need to know all the database overhead sizes and block header sizes etc etc as I have a very complex database to build and it needs to be speed tuned beyond reckoning [snip] I am using postgres 8.1 if anyone can post links to pages containing over head information and index block header information it would be most appreciated as I cannot seem to find anything I'd look to the source if you care that strongly. Don't rely on any info found on the internet unless it explicitly mentions 8.1 - these things change. Have a look in backend/storage/ and backend/access/ I'd guess (not a hacker myself). Some thoughts though: 1. If you care that strongly about performance, start building it with 8.3 2. Does your testing show that index storage overheads are/will be a problem? If not, I'd concentrate on the testing to make sure you've identified the bottlenecks first. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Trouble with LEFT JOIN using VIEWS.
=?ISO-8859-2?Q?Piotr_Gasid=B3o?= [EMAIL PROTECTED] writes: I've just hit problem, that is unusual for me. View definition: SELECT users.id, users.user_name, users.extra IS NOT NULL AS has_extra FROM users; What you've got here is a non-nullable target list, which creates an optimization fence when used in the nullable side of an outer join. The problem is that has_extra should read as NULL in the query output for a sites_secure row that has no match in users_secure_with_has_extra, but making users.extra go to null will not make that happen, so there's a constraint on where the expression can be evaluated. The current planner has no way to deal with that except by restricting the plan structure. We have some ideas about how to fix this, but don't hold your breath ... it's going to take major surgery on the planner, AFAICS. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] Trouble with LEFT JOIN using VIEWS.
Hello, I've just hit problem, that is unusual for me. quaker= \d sites Table public.sites Column | Type| Modifiers ---+---+ id| integer | not null default nextval('sites_id_seq'::regclass) site_name | character varying | not null user_id | integer | not null extra | integer | Indexes: sites_pkey PRIMARY KEY, btree (id) sites_site_name_key_unique UNIQUE, btree (site_name text_pattern_ops) sites_user_id_key btree (user_id) quaker= \d users Table public.users Column | Type| Modifiers ---+---+ id| integer | not null default nextval('users_id_seq'::regclass) user_name | character varying | not null extra | integer | Indexes: users_pkey PRIMARY KEY, btree (id) Both tables filled with 100k records of random data. In users id is in range from 1..100k, same in sites. In sites user_id is random, range from 1..150k. I've created views: quaker= \d users_secure View public.users_secure Column | Type| Modifiers ---+---+--- id| integer | user_name | character varying | View definition: SELECT users.id, users.user_name FROM users; quaker= \d users_secure_with_has_extra View public.users_secure_with_has_extra Column | Type| Modifiers ---+---+--- id| integer | user_name | character varying | has_extra | boolean | View definition: SELECT users.id, users.user_name, users.extra IS NOT NULL AS has_extra FROM users; Now, when I do simple query to find all data for sites matching site_name like 'H3bh%' (there are at least one record in sites matching this condition). quaker= explain analyze select s.site_name,u.user_name from sites_secure s left join users_secure_with_has_extra u on u.id = s.user_id where site_name like 'H3bh%' order by site_name limit 10; QUERY PLAN - Limit (cost=3897.02..3897.03 rows=2 width=44) (actual time=430.326..430.331 rows=1 loops=1) - Sort (cost=3897.02..3897.03 rows=2 width=44) (actual time=430.321..430.323 rows=1 loops=1) Sort Key: sites.site_name - Nested Loop Left Join (cost=0.00..3897.01 rows=2 width=44) (actual time=290.103..430.301 rows=1 loops=1) Join Filter: (inner.id = outer.user_id) - Index Scan using sites_site_name_key_unique on sites (cost=0.00..6.01 rows=1 width=16) (actual time=0.044..0.054 rows=1 loops=1) Index Cond: (((site_name)::text ~=~ 'H3bh'::text) AND ((site_name)::text ~~ 'H3bi'::text)) Filter: ((site_name)::text ~~ 'H3bh%'::text) - Seq Scan on users (cost=0.00..1641.00 rows=10 width=20) (actual time=0.007..245.406 rows=10 loops=1) Total runtime: 430.432 ms (10 rows) When I resign from LEFT JOIN users_secure_with_has_extra, and put JOIN instead I've got: quaker= explain analyze select s.site_name,u.user_name from sites_secure s join users_secure_with_has_extra u on u.id = s.user_id where site_name like 'H3bh%' order by site_name limit 10; QUERY PLAN - Limit (cost=9.05..9.06 rows=1 width=24) (actual time=0.112..0.118 rows=1 loops=1) - Sort (cost=9.05..9.06 rows=1 width=24) (actual time=0.106..0.108 rows=1 loops=1) Sort Key: sites.site_name - Nested Loop (cost=0.00..9.04 rows=1 width=24) (actual time=0.073..0.088 rows=1 loops=1) - Index Scan using sites_site_name_key_unique on sites (cost=0.00..6.01 rows=1 width=16) (actual time=0.044..0.050 rows=1 loops=1) Index Cond: (((site_name)::text ~=~ 'H3bh'::text) AND ((site_name)::text ~~ 'H3bi'::text)) Filter: ((site_name)::text ~~ 'H3bh%'::text) - Index Scan using users_pkey on users (cost=0.00..3.02 rows=1 width=16) (actual time=0.019..0.023 rows=1 loops=1) Index Cond: (users.id = outer.user_id) Total runtime: 0.216 ms (10 rows) As explain shows PostgreSQL is using index scan on users, instead of seq scan like in example above. Now. When I use view with no has_extra field (important: field is a simple function on extra field) I get expectable results. Both using indexes. quaker= explain analyze select s.site_name,u.user_name from sites_secure s left join users_secure u on u.id
[PERFORM] Cost-Based Vacuum Delay tuning
Hi, I'm currently trying to tune the Cost-Based Vacuum Delay in a 8.2.5 server. The aim is to reduce as much as possible the performance impact of vacuums on application queries, with the background idea of running autovacuum as much as possible[1]. My test involves vacuuming a large table, and measuring the completion time, as the vacuuming proceeds, of a rather long running application query (involving a table different from the one being vacuumed) which cannot fit entirely in buffers (and the completion time of the vacuum, because it needs not be too slow, of course). I ran my tests with a few combinations of vacuum_cost_delay/vacuum_cost_limit, while keeping the other parameters set to the default from the 8.2.5 tarball: vacuum_cost_page_hit = 1 vacuum_cost_page_miss = 10 vacuum_cost_page_dirty = 20 The completion time of the query is about 16 seconds in isolation. With a vacuuming proceeding, here are the results: vacuum_cost_delay/vacuum_cost_limit (deactivated) 20/200 40/200 100/1000 150/1000 200/1000 300/1000 VACUUM ANALYZE time54 s112 s188 s109 s 152 s 190 s 274 s SELECT time50 s 28 s 26 s 24 s 22 s 20 s 19 s I have noticed that others (Alvaro, Joshua) suggest to set vacuum_cost_delay as low as 10 or 20 ms, however in my situation I'd like to lower the performance impact in application queries and will probably choose 150/1000 where only a +40% is seen - I'm curious if anyone else has followed the same path, or is there any outstanding flaw I've missed here? I'm talking outstanding, as of course any local decision may be different in the hope of favouring a different database/application behaviour. Other than that, it's the results obtained with the design principle of Cost-Base Vacuum Delay, which I find a little surprising. Of course, I think it has been thought through a lot, and my observations are probably naive, but I'm going to throw my ideas anyway, who knows. I'd think that it would be possible to lower yet again the impact of vacuuming on other queries, while keeping a vacuuming time with little overhead, if dynamically changing the delays related to database activity, rather than using fixed costs and delays. For example, before and after each vacuum sleep delay is completed, pg could: - check the amount of currently running queries (pg_stat_activity), and continue sleeping if it is above a configured threshold; by following this path, databases with peak activities could use a threshold of 1 and have zero ressource comsumption for vacuuming during peaks, still having nearly no time completion overhead for vacuuming out of peaks (since the check is performed also before the sleep delay, which would be deactivated if no queries are running); if we can afford a luxury implementation, we could always have a maximum sleep time configuration, which would allow vacuuming to proceed a little bit even when there's no timeframe with low enough database activity - alternatively, pg could make use of some longer term statistics (load average, IO statistics) to dynamically pause the vacuuming - this I guess is related to the host OS and probably more difficult to have working correctly with multiple disks and/or processes running - however, if you want high performance from PostgreSQL, you probably won't host other IO applications on the same disk(s) While I'm at it, a different Cost-Based Vacuum Delay issue: VACUUM FULL also follows the Cost-Based Vacuum Delay tunings. While it makes total sense when you want to perform a query on another table, it becomes a problem when your query is waiting for the exclusive lock on the vacuumed table. Potentially, you will have the vacuuming proceeding slowly because of the Cost-Based Vacuum Delay, and a blocked application because the application queries are just waiting. I'm wondering if it would not be possible to dynamically ignore (or lower, if it makes more sense?) the Cost-Based Vacuum Delay during vacuum full, if a configurable amount of queries are waiting for the lock? (please save yourself from answering you should never run VACUUM FULL if you're vacuuming enough - as long as VACUUM FULL is available in PostgreSQL, there's no reason to not make it as practically usable as possible, albeit with low dev priority) Ref: [1] inspired by http://developer.postgresql.org/~wieck/vacuum_cost/ -- Guillaume Cottenceau, MNC Mobile News Channel SA, an Alcatel-Lucent Company Av. de la Gare 10, 1003 Lausanne, Switzerland ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] database tuning
On Fri, 2007-12-07 at 12:45 +1200, kelvan wrote: hi i need to know all the database overhead sizes and block header sizes etc etc as I have a very complex database to build and it needs to be speed tuned beyond reckoning If your need-for-speed is so high, I would suggest using 8.3 or at least looking at the 8.3 documentation. This release is very nearly production and is much faster than 8.1 or 8.2. You may not have realised that Postgres dot releases are actually major releases and have significant speed differences. There's not much to be done about the overheads you mention, so best to concentrate your efforts on index planning for your most frequently executed queries. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Cost-Based Vacuum Delay tuning
On Dec 7, 2007, at 4:50 AM, Guillaume Cottenceau wrote: Hi, I'm currently trying to tune the Cost-Based Vacuum Delay in a 8.2.5 server. The aim is to reduce as much as possible the performance impact of vacuums on application queries, with the background idea of running autovacuum as much as possible[1]. My test involves vacuuming a large table, and measuring the completion time, as the vacuuming proceeds, of a rather long running application query (involving a table different from the one being vacuumed) which cannot fit entirely in buffers (and the completion time of the vacuum, because it needs not be too slow, of course). I ran my tests with a few combinations of vacuum_cost_delay/vacuum_cost_limit, while keeping the other parameters set to the default from the 8.2.5 tarball: vacuum_cost_page_hit = 1 vacuum_cost_page_miss = 10 vacuum_cost_page_dirty = 20 The completion time of the query is about 16 seconds in isolation. With a vacuuming proceeding, here are the results: vacuum_cost_delay/vacuum_cost_limit (deactivated) 20/200 40/200 100/1000 150/1000 200/1000 300/1000 VACUUM ANALYZE time54 s112 s188 s109 s 152 s 190 s 274 s SELECT time50 s 28 s 26 s 24 s22 s 20 s 19 s While you do mention that the table you're running your select on is too big to fit in the shared_buffers, the drop in time between the first run and the rest most likely still reflects the fact that when running those tests successively a good portion of the table will already be in shared_buffers as well as being in the filesystem cache, i.e. very little of the runs after the first will have to hit the disk much. I have noticed that others (Alvaro, Joshua) suggest to set vacuum_cost_delay as low as 10 or 20 ms, however in my situation I'd like to lower the performance impact in application queries and will probably choose 150/1000 where only a +40% is seen - I'm curious if anyone else has followed the same path, or is there any outstanding flaw I've missed here? I'm talking outstanding, as of course any local decision may be different in the hope of favouring a different database/application behaviour. Other than that, it's the results obtained with the design principle of Cost-Base Vacuum Delay, which I find a little surprising. Of course, I think it has been thought through a lot, and my observations are probably naive, but I'm going to throw my ideas anyway, who knows. I'd think that it would be possible to lower yet again the impact of vacuuming on other queries, while keeping a vacuuming time with little overhead, if dynamically changing the delays related to database activity, rather than using fixed costs and delays. For example, before and after each vacuum sleep delay is completed, pg could: - check the amount of currently running queries (pg_stat_activity), and continue sleeping if it is above a configured threshold; by following this path, databases with peak activities could use a threshold of 1 and have zero ressource comsumption for vacuuming during peaks, still having nearly no time completion overhead for vacuuming out of peaks (since the check is performed also before the sleep delay, which would be deactivated if no queries are running); if we can afford a luxury implementation, we could always have a maximum sleep time configuration, which would allow vacuuming to proceed a little bit even when there's no timeframe with low enough database activity - alternatively, pg could make use of some longer term statistics (load average, IO statistics) to dynamically pause the vacuuming - this I guess is related to the host OS and probably more difficult to have working correctly with multiple disks and/or processes running - however, if you want high performance from PostgreSQL, you probably won't host other IO applications on the same disk(s) These ideas have been discussed much. Look in the archives to the beginning of this year. I think the general consensus was that it would be good have multiple autovacuum workers that could be tuned for different times or workloads. I know Alvarro was going to work on something along those lines but I'm not sure what's made it into 8.3 or what's still definitely planned for the future. While I'm at it, a different Cost-Based Vacuum Delay issue: VACUUM FULL also follows the Cost-Based Vacuum Delay tunings. While it makes total sense when you want to perform a query on another table, it becomes a problem when your query is waiting for the exclusive lock on the vacuumed table. Potentially, you will have the vacuuming proceeding slowly because of the Cost-Based Vacuum Delay, and a blocked application because the application queries are just waiting. I'm wondering if it would not be possible to dynamically ignore (or lower, if it makes more sense?) the Cost-Based
Re: [PERFORM] TB-sized databases
On Thursday 06 December 2007 04:38, Simon Riggs wrote: Robert, On Wed, 2007-12-05 at 15:07 -0500, Robert Treat wrote: If the whole performance of your system depends upon indexed access, then maybe you need a database that gives you a way to force index access at the query level? That sounds like a request for hints, which is OT here, ISTM. If you want to eat peas, and someone suggests you use a knife, can I only argue the validity of using a knife? I'd rather just recommend a spoon. I think you're completly overlooking the effect of disk latency has on query times. We run queries all the time that can vary from 4 hours to 12 hours in time based solely on the amount of concurrent load on the system, even though they always plan with the same cost. Not at all. If we had statement_cost_limit then it would be applied after planning and before execution begins. The limit would be based upon the planner's estimate, not the likely actual execution time. This is nice, but it doesnt prevent slow queries reliably (which seemed to be in the original complaints), since query time cannot be directly traced back to statement cost. So yes a query may vary in execution time by a large factor as you suggest, and it would be difficult to set the proposed parameter accurately. However, the same is also true of statement_timeout, which we currently support, so I don't see this point as an blocker. Which leaves us at the burning question: Would you use such a facility, or would the difficulty in setting it exactly prevent you from using it for real? I'm not sure. My personal instincts are that the solution is too fuzzy for me to rely on, and if it isnt reliable, it's not a good solution. If you look at all of the things people seem to think this will solve, I think I can raise an alternative option that would be a more definitive solution: prevent queries from taking longer than x - statement_timeout. prevent planner from switching to bad plan - hint system prevent query from consuming too many resources - true resource restrictions at the database level I'm not so much against the idea of a statement cost limit, but I think we need to realize that it does not really solve as many problems as people think, in cases where it will help it often will do so poorly, and that there are probably better solutions available to those problems. Of course if you back me into a corner I'll agree a poor solution is better than no solution, so... -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] database tuning
Simon Riggs [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] On Fri, 2007-12-07 at 12:45 +1200, kelvan wrote: hi i need to know all the database overhead sizes and block header sizes etc etc as I have a very complex database to build and it needs to be speed tuned beyond reckoning If your need-for-speed is so high, I would suggest using 8.3 or at least looking at the 8.3 documentation. This release is very nearly production and is much faster than 8.1 or 8.2. You may not have realised that Postgres dot releases are actually major releases and have significant speed differences. There's not much to be done about the overheads you mention, so best to concentrate your efforts on index planning for your most frequently executed queries. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings Simon Riggs [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] On Fri, 2007-12-07 at 12:45 +1200, kelvan wrote: hi i need to know all the database overhead sizes and block header sizes etc etc as I have a very complex database to build and it needs to be speed tuned beyond reckoning If your need-for-speed is so high, I would suggest using 8.3 or at least looking at the 8.3 documentation. This release is very nearly production and is much faster than 8.1 or 8.2. You may not have realised that Postgres dot releases are actually major releases and have significant speed differences. There's not much to be done about the overheads you mention, so best to concentrate your efforts on index planning for your most frequently executed queries. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ok heres the thing i dont have a choice i just have to work with whats given whether it is good or not why i need these overheads is for block calculations and and tablespace calculations i have to keep everything in a very very small area on the hdd for head reading speed as the server i am forced to use is a peice of crap so i need to do my calculations to resolve this it is not that i dont know how to do my job i understand effective indexing materlized views and all other effects of database tuning is was my major aspect in my study i just need to know the numbers to do what i have to do. i am new to postgres i have used many other database management systems i know the over heads for all of them just not this one if someone could please be of assisstance. let me give a breef outlay of what i have without breaking my confidentality agreement mac server mac os 10.x postgres 8.2.5 (appologies i just got updated documentation with errors fixed in it) 70gig hdd 5 gig ram 4 cpus (not that it matters as postgres is not multi threading) and i have to support approxmatally anywhere from 5000 - 3 users all using it concurentally as you can see this server wouldnt be my first choice (or my last choice) but as i said i have not choice at this time. the interface programmer and i have come up with ways to solve certian problems in preformance that this server produces but i still need to tune the database if you need any other information for someone to give me the overheads then please ask but i may not be able to tell you regards kelvan ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PERFORM] Measuring table and index bloat
One of those things that comes up regularly on this list in particular are people whose performance issues relate to bloated tables or indexes. What I've always found curious is that I've never seen a good way suggested to actually measure said bloat in any useful numeric terms--until today. Greg Sabino Mullane just released a Nagios plug-in for PostgreSQL that you can grab at http://bucardo.org/nagios_postgres/ , and while that is itself nice the thing I found most remarkable is the bloat check. The majority of that code is an impressive bit of SQL that anyone could use even if you have no interest in Nagios, which is why I point it out for broader attention. Look in check_postgres.pl for the check_bloat routine and the big statement starting at the aptly labled This was fun to write section. If you pull that out of there and replace $MINPAGES and $MINIPAGES near the end with real values, you can pop that into a standalone query and execute it directly. Results look something like this (reformatting for e-mail): schemaname | tablename | reltuples | relpages | otta | tbloat | public | accounts | 250 |41667 | 40382 |1.0 | wastedpages | wastedbytes | wastedsize | iname | ituples | 1285 |10526720 | 10 MB | accounts_pkey | 250 | ipages | iotta | ibloat | wastedipages | wastedibytes | wastedisize 5594 | 35488 |0.2 |0 |0 | 0 bytes I'd be curious to hear from those of you who have struggled with this class of problem in the past as to whether you feel this quantifies the issue usefully. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] TB-sized databases
Tom Lane wrote: Ron Mayer [EMAIL PROTECTED] writes: Tom Lane wrote: There's something fishy about this --- given that that plan has a lower cost estimate, it should've picked it without any artificial constraints. One final thing I find curious about this is that the estimated number of rows is much closer in the offset 0 form of the query. Since the logic itself is identical, I would have expected the estimated total number of rows for both forms of this query to be identical. Any reason the two plans estimate a different total number of rows? (explain statements for the two forms of the same query from earlier in the thread here: http://archives.postgresql.org/pgsql-performance/2007-12/msg00088.php ) ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Cost-Based Vacuum Delay tuning
On Dec 7, 2007, at 10:44 AM, Guillaume Cottenceau wrote: Erik Jones erik 'at' myemma.com writes: vacuum_cost_delay/vacuum_cost_limit (deactivated) 20/200 40/200 100/1000 150/1000 200/1000 300/1000 VACUUM ANALYZE time54 s112 s188 s109 s 152 s 190 s 274 s SELECT time50 s 28 s 26 s 24 s22 s 20 s 19 s While you do mention that the table you're running your select on is too big to fit in the shared_buffers, the drop in time between the first run and the rest most likely still reflects the fact that when These figures don't show a difference between first run and subsequent runs. For each parameter tuning, a couple of runs are fired after database restart, and once the value is approximately constant, it's picked and put in this table. The deactivated shows the (stable, from subsequent runs) figure when vacuum delay is disabled (vacuum_cost_delay parameter quoted), not the first run, if that's where the confusion came from. It was. Is it on pgsql-hackers? I haven't found much stuff in pgsql-performance while looking for vacuum_cost_delay tuning. would be good have multiple autovacuum workers that could be tuned for different times or workloads. I know Alvarro was going to work Sounds interesting. Run the initial archive search against pgsql-general over the last year for a thread called 'Autovacuum Improvements' I'm wondering if it would not be possible to dynamically ignore (or lower, if it makes more sense?) the Cost-Based Vacuum Delay during vacuum full, if a configurable amount of queries are waiting for the lock? (please save yourself from answering you should never run VACUUM FULL if you're vacuuming enough - as long as VACUUM FULL is available in PostgreSQL, there's no reason to not make it as practically usable as possible, albeit with low dev priority) Ok, I won't say what you said not to say. But, I will say that I don't agree with you're conjecture that VACUUM FULL should be made more lightweight, it's like using dynamite to knock a whole in a wall for a window. Thanks for opening a new kind of trol^Hargument against VACUUM FULL, that one's more fresh (at least to me, who doesn't follow the list too close anyway). Just for the record, I inherited a poorly (actually, not would be more appropriate) tuned database, containing more than 90% of dead tuples on large tables, and I witnessed quite some performance improvement while I could fix that. If you really want the VACUUM FULL effect without having to deal with vacuum_cost_delay, use CLUSTER. It also re-writes the table and, AFAIK, is not subject to any of the vacuum related configuration parameters. I'd argue that if you really need VACUUM FULL, you may as well use CLUSTER to get a good ordering of the re-written table. Erik Jones Software Developer | Emma® [EMAIL PROTECTED] 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate market in style. Visit us online at http://www.myemma.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Cost-Based Vacuum Delay tuning
Erik Jones erik 'at' myemma.com writes: vacuum_cost_delay/vacuum_cost_limit (deactivated) 20/200 40/200 100/1000 150/1000 200/1000 300/1000 VACUUM ANALYZE time54 s112 s188 s109 s 152 s 190 s 274 s SELECT time50 s 28 s 26 s 24 s22 s 20 s 19 s While you do mention that the table you're running your select on is too big to fit in the shared_buffers, the drop in time between the first run and the rest most likely still reflects the fact that when These figures don't show a difference between first run and subsequent runs. For each parameter tuning, a couple of runs are fired after database restart, and once the value is approximately constant, it's picked and put in this table. The deactivated shows the (stable, from subsequent runs) figure when vacuum delay is disabled (vacuum_cost_delay parameter quoted), not the first run, if that's where the confusion came from. running those tests successively a good portion of the table will already be in shared_buffers as well as being in the filesystem cache, i.e. very little of the runs after the first will have to hit A dd sized at the total RAM size is run between each test (not between each parameter tuning, between each *query test*), to remove the OS disk cache effect. Of course, the PostgreSQL caching effect cannot be removed (maybe, it shouldn't, as after all this caching is here to improve performance), but the query is selected to generate a lot of disk activity even between each run (that's why I said a query which cannot fit entirely in buffers). the disk much. I have of course checked that the subsequent runs mean essentially disk activity, not CPU activity. - alternatively, pg could make use of some longer term statistics (load average, IO statistics) to dynamically pause the vacuuming - this I guess is related to the host OS and probably more difficult to have working correctly with multiple disks and/or processes running - however, if you want high performance from PostgreSQL, you probably won't host other IO applications on the same disk(s) These ideas have been discussed much. Look in the archives to the beginning of this year. I think the general consensus was that it Is it on pgsql-hackers? I haven't found much stuff in pgsql-performance while looking for vacuum_cost_delay tuning. would be good have multiple autovacuum workers that could be tuned for different times or workloads. I know Alvarro was going to work Sounds interesting. I'm wondering if it would not be possible to dynamically ignore (or lower, if it makes more sense?) the Cost-Based Vacuum Delay during vacuum full, if a configurable amount of queries are waiting for the lock? (please save yourself from answering you should never run VACUUM FULL if you're vacuuming enough - as long as VACUUM FULL is available in PostgreSQL, there's no reason to not make it as practically usable as possible, albeit with low dev priority) Ok, I won't say what you said not to say. But, I will say that I don't agree with you're conjecture that VACUUM FULL should be made more lightweight, it's like using dynamite to knock a whole in a wall for a window. Thanks for opening a new kind of trol^Hargument against VACUUM FULL, that one's more fresh (at least to me, who doesn't follow the list too close anyway). Just for the record, I inherited a poorly (actually, not would be more appropriate) tuned database, containing more than 90% of dead tuples on large tables, and I witnessed quite some performance improvement while I could fix that. -- Guillaume Cottenceau, MNC Mobile News Channel SA, an Alcatel-Lucent Company Av. de la Gare 10, 1003 Lausanne, Switzerland - direct +41 21 317 50 36 ---(end of broadcast)--- TIP 6: explain analyze is your friend