Re: [PERFORM][OT] Best suiting OS
Hi Jean-David, On Mon, 2009-10-05 at 15:37 +0200, Jean-David Beyer wrote: Robert Haas wrote (in part): Also, I'd just like to mention that vi is a much better editor than emacs. That is not my impression. I have used vi from when it first came out (I used ed before that) until about 1998 when I first installed Linux on one of my machines and started using emacs. I find that for some tasks involving global editing, that vi is a lot easier to use. But for most of the things I do on a regular basis, if find emacs better. So, for me, it is not which is the better editor, but which is the better editor for the task at hand. You are probably absolutely right, but Robert only wanted to point out that this conversation gets in the flame-war direction, in his subtle way of doing this... Cheers, Csaba. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] TB-sized databases
Isn't that what statement_timeout is for? Since this is entirely based on estimates, using arbitrary fuzzy numbers for this seems fine to me; precision isn't really the goal. There's an important difference to statement_timeout: this proposal would avoid completely taking any resources if it estimates it can't be executed in proper time, but statement_timeout will allow a bad query to run at least statement_timeout long... Cheers, Csaba. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] TB-sized databases
On Thu, 2007-11-29 at 10:45 -0500, Tom Lane wrote: Given that this list spends all day every day discussing cases where the planner is wrong, I'd have to think that that's a bet I wouldn't take. You could probably avoid this risk by setting the cutoff at something like 100 or 1000 times what you really want to tolerate, but how useful is it then? It would still be useful in the sense that if the planner is taking wrong estimates you must correct it somehow... raise statistics target, rewrite query or other tweaking, you should do something. An error is sometimes better than gradually decreasing performance because of too low statistics target for example. So if the error is thrown because of wrong estimate, it is still a valid error raising a signal that the DBA has to do something about it. It's still true that if the planner estimates too low, it will raise no error and will take the resources. But that's just what we have now, so it wouldn't be a regression of any kind... Cheers, Csaba. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] TB-sized databases
On Wed, 2007-11-28 at 08:27 -0500, Bill Moran wrote: Is there something wrong with: set enable_seqscan = off ? Nothing wrong with enable_seqscan = off except it is all or nothing type of thing... if you want the big table to never use seqscan, but a medium table which is joined in should use it, then what you do ? And setting enable_seqscan = off will actually not mean the planner can't use a sequential scan for the query if no other alternative exist. In any case it doesn't mean please throw an error if you can't do this without a sequential scan. In fact an even more useful option would be to ask the planner to throw error if the expected cost exceeds a certain threshold... Cheers, Csaba. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] TB-sized databases
On Wed, 2007-11-28 at 08:54 -0500, Bill Moran wrote: Nothing wrong with enable_seqscan = off except it is all or nothing type of thing... If that's true, then I have a bug report to file: [snip] It looks to me to be session-alterable. I didn't mean that it can't be set per session, I meant that it is not fine grained enough to select the affected table but it affects _all_ tables in a query... and big tables are rarely alone in a query. Cheers, Csaba. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] autovacuum: recommended?
On Fri, 2007-11-16 at 12:13 +0100, Tobias Brox wrote: [snip] should i use both auto-vacuum and manual-vacuum? I would say for 8.2 that's the best strategy (which might change with 8.3 and it's multiple vacuum workers thingy). That being said, we have some huge tables in our database and pretty much traffic, and got quite some performance problems when the autovacuum kicked in and started vacuuming those huge tables, so we're currently running without. Autovacuum can be tuned to not touch those tables, but we've chosen to leave it off. We are doing that here, i.e. set up autovacuum not to touch big tables, and cover those with nightly vacuums if there is still some activity on them, and one weekly complete vacuum of the whole DB (vacuum without other params, preferably as the postgres user to cover system tables too). In fact we also have a few very frequently updated small tables, those are also covered by very frequent crontab vacuums because in 8.2 autovacuum can spend quite some time vacuuming some medium sized tables and in that interval the small but frequently updated ones get bloated. This should be better with 8.3 and multiple autovacuum workers. For the disable for autovacuum part search for pg_autovacuum in the docs. I would say the autovacuum + disable autovacuum on big tables + nightly vacuum + weekly vacuumdb + frequent crontab vacuum of very updated small tables works well in 8.2. One thing which could be needed is to also schedule continuous vacuum of big tables which are frequently updated, with big delay settings to throttle the resources used by the vacuum. We don't need that here because we don't update frequently our big tables... Cheers, Csaba. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Searching for the cause of a bad plan
On Thu, 2007-09-27 at 11:07 -0700, Ron Mayer wrote: Csaba Nagy wrote: Well, my problem was actually solved by rising the statistics target, Would it do more benefit than harm if postgres increased the default_statistics_target? I see a fair number of people (myself included) asking questions who's resolution was to ALTER TABLE SET STATISTICS; and I think relatively fewer (if any?) people concerned about space in pg_statistic or people improving analyze time by reducing the statistics target. Well, the cost of raising the statistics target is far from zero: with all defaults the analyze time was ~ 10 seconds, with one column set to 100 was ~ 1.5 minutes, with one column set to 1000 was 15 minutes for the table in question (few 100M rows). Of course the IO load must have been proportional to the timings... so I'm pretty sure the current default is serving well most of the situations. Cheers, Csaba. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Searching for the cause of a bad plan
Just an idea, but with the 8.3 concurrent scan support would it be possible to hang a more in depth analyze over exisiting sequential scans. Then it would be a lower cost to have higher resolution in the statistics because the I/O component would be hidden. The biggest problem with that is that it wouldn't be deterministic... the table in question from my original post is never scanned sequentially in normal operation. The other way around is also possible, when sequential scans are too frequent, in that case you wouldn't want to also analyze all the time. So there would be a need for logic of when to analyze or not with a sequential scan and when do it proactively without waiting for one... and I'm not sure it will be worth the complexity. I think it would me much more productive if some long running query tracking combined with a background planner thread would do targeted analyzes for specific correlations/distributions/conditions based on what queries are actually running on the system. Cheers, Csaba. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Searching for the cause of a bad plan
On Wed, 2007-09-26 at 11:22 -0400, Tom Lane wrote: ... how many values of a are there really, and what's the true distribution of counts? table_a has 23366 distinct values. Some statistics (using R): summary(table_a_histogram) a count Min. : 7857 Min. : 1 1st Qu.:73628 1st Qu.: 9 Median :700011044 Median : 22 Mean :622429573 Mean : 17640 3rd Qu.:700018020 3rd Qu.:391 Max. :83349 Max. :3347707 I'm not sure what you want to see in terms of distribution of counts, so I created 2 plots: a against the counts for each distinct a value, and the histogram of the log of the counts (without the log it's not really readable). I hope they'll make it through to the list... Do the plan estimates get closer to reality if you set a higher statistics target? The results of setting higher statistics targets are attached too. I can't tell if the stats are closer to reality or not, but the plan changes in any case... Cheers, Csaba. attachment: table_a_counts.pngattachment: table_a_counts_histogram.pngdb=# alter table temp_table_a ALTER a set statistics 100; db=# analyze verbose temp_table_a; INFO: analyzing public.temp_table_a INFO: temp_table_a: scanned 3 of 655299 pages, containing 1887 live rows and 0 dead rows; 3 rows in sample, 412183071 estimated total rows db=# select * from pg_stats where tablename = 'temp_table_a'; schemaname | tablename | attname | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation
Re: [PERFORM] Searching for the cause of a bad plan
On Thu, 2007-09-27 at 10:40 -0400, Tom Lane wrote: And yet there's another trap here: if the parameter you passed in chanced to be one of the very common values, a plan that was optimized for a small number of matches would perform terribly. We've speculated about trying to deal with these types of situations by switching plans on-the-fly at runtime, but that's just blue-sky dreaming at the moment. In the short run, if boosting the stats target doesn't result in acceptable plans, there may be no real solution other than to avoid parameterized queries on this column. Well, my problem was actually solved by rising the statistics target, thanks to Simon for suggesting it. The problem is that it's quite hard to tell (for a non-postgres-developer) which column needs higher statistics target when a multi-join query doesn't work as expected... Apropos switching plans on the fly and blue sky dreaming... IIRC, there were some plans to cache plans in shared mode for the whole cluster, not just per backend. What about allowing the user to prepare a plan offline, i.e. without actually executing it (via some variant of PREPARE CACHED or so), and let the planner do more exhaustive cost estimation, possibly actually analyzing specific tables for correlations etc., on the ground that the whole thing is done only once and reused many times. The resulting plan could also contain turning points for parameter values, which would switch between different variants of the plan, this way it can be more specific with parameter values even if planned generically... and it could set up some dependencies on the relevant statistics on which it is basing it's decisions, so it will be invalidated when those statistics are presumably changed more than a threshold, and possibly a background planner thread re-plans it, after the necessary analyze steps are run again. If there is a background planner, that one could also collect long running query statistics and automatically do a cached plans for the most offending ones, and possibly generate missing index, you should cluster this table and such warnings. The fast planner would still be needed for interactive queries which are not yet prepared, so new interactive queries don't pay the unpredictable cost of hard planning. If those run fast enough, they will never get prepared, they don't need to... otherwise they should be passed to the background planner to be exhaustively (or at least more thoroughly) analyzed... One other thing I dream of would be some way to tell postgres that a query should run in batch mode or interactive mode, i.e. it should be optimized for best throughput or fast startup, in the second case great care should be taken to avoid the worst case scenarios too. I know there's a strong feeling against query hints around here, but this one could fly using a GUC parameter, which could be set in the config file for a default value (batch for a data warehouse, interactive for an OLTP application), and it also could be set per session. Ok, that's about the dreaming... Cheers, Csaba. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Searching for the cause of a bad plan
Csaba, please can you copy that data into fresh tables, re-ANALYZE and then re-post the EXPLAINs, with stats data. Here you go, fresh experiment attached. Cheers, Csaba. db=# \d temp_table_a Table public.temp_table_a Column | Type | Modifiers ++--- a | bigint | not null b | bigint | not null Indexes: temp_pk_table_a PRIMARY KEY, btree (a, b) db=# \d temp_table_b1 Table public.temp_table_b1 Column | Type | Modifiers ++--- b | bigint | not null Indexes: temp_pk_table_b1 PRIMARY KEY, btree (b) db=# \d temp_table_b2 Table public.temp_table_b2 Column | Type | Modifiers ++--- b | bigint | not null Indexes: temp_pk_table_b2 PRIMARY KEY, btree (b) Foreign-key constraints: temp_fk_table_b2_b1 FOREIGN KEY (b) REFERENCES temp_table_b1(b) db=# analyze verbose temp_table_a; INFO: analyzing public.temp_table_a INFO: temp_table_a: scanned 3000 of 655299 pages, containing 1887000 live rows and 0 dead rows; 3000 rows in sample, 412183071 estimated total rows db=# analyze verbose temp_table_b1; INFO: analyzing public.temp_table_b1 INFO: temp_table_b1: scanned 3000 of 57285 pages, containing 2232000 live rows and 0 dead rows; 3000 rows in sample, 42620040 estimated total rows db=# analyze verbose temp_table_b2; INFO: analyzing public.temp_table_b2 INFO: temp_table_b2: scanned 57 of 57 pages, containing 41967 live rows and 0 dead rows; 3000 rows in sample, 41967 estimated total rows db=# select * from pg_stats where tablename = 'temp_table_a'; schemaname | tablename | attname | null_frac | avg_width | n_distinct | most_common_vals |most_common_freqs | histogram_bounds| correlation +--+-+---+---++---+-++- public | temp_table_a | a | 0 | 8 | 1261 | {74117,700022128,72317,79411,700023682,76025,72843,700014833,76505,78694} | {0.015,0.0116667,0.0097,0.0097,0.0097,0.009,0.0087,0.008,0.0077,0.0077} | {70010872,70035,73086,75843,78974,700011369,700013305,700015988,700019048,700022257,83151} |0.850525 public | temp_table_a | b | 0 | 8 | -1 | | | {41708986,700707712,803042997,7004741432,7007455842,7009719495,7013869874,7016501748,7019139288,7025078292,7037930133} |0.646759 db=# select * from pg_stats where tablename = 'temp_table_b1'; schemaname | tablename | attname | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation +---+-+---+---++--+---+-+- public | temp_table_b1 | b | 0 | 8 | -1 | | | {41719236,801608645,7003211583,7007403678,7011591097,7016707278,7021089839,7025573684,7029316772,7033888226,8002470137} |0.343186 db=# select * from pg_stats where tablename = 'temp_table_b2'; schemaname | tablename | attname | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation +---+-+---+---++--+---+---+- public | temp_table_b2 | b | 0 | 8 | -1 | | | {70054,7000352893,7000357745,7000362304,7000367025,7000371629,7000376587,7000381229,7009567724,7023749432,7034300740} | -0.216073 db=# prepare test_001(bigint) as db-# SELECT tb.* db-# FROM temp_table_a ta db-# JOIN
Re: [PERFORM] Searching for the cause of a bad plan
On Mon, 2007-09-24 at 14:27 +0100, Simon Riggs wrote: Csaba, please can you copy that data into fresh tables, re-ANALYZE and then re-post the EXPLAINs, with stats data. Well, I can of course. I actually tried to generate some random data with similar record count and relations between the tables (which I'm not sure I succeeded at), without the extra columns, but it was happily yielding the nested loop plan. So I guess I really have to copy the whole data (several tens of GB). But from my very limited understanding of what information is available for the planner, I thought that the record count estimated for the join between table_a and table_b1 on column b should be something like (estimated record count in table_a for value a) * (weight of b range covered by table_b1 and table_a in common) / (weight of b range covered by table_a) This is if the b values in table_a wouldn't be correlated at all with the content of table_b2. The reality is that they are, but the planner has no information about that. I have no idea how the planner works though, so this might be totally off... I will copy the data and send the results (not promising though that it will be today). Cheers, Csaba. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] select count(*) performance (vacuum did not help)
On Mon, 2007-09-24 at 17:14 +0200, Gábor Farkas wrote: will i achieve the same thing by simply dropping that table and re-creating it? If you have an index/PK on that table, the fastest and most useful way to rebuild it is to do CLUSTER on that index. That will be a lot faster than VACUUM FULL and it will also order your table in index order... but it will also lock it in exclusive mode just as VACUUM FULL would do it. If your table has just a few live rows and lots of junk in it, CLUSTER should be fast enough. With 20K entries I would expect it to be fast enough not to be a problem... Cheers, Csaba. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Linux mis-reporting memory
On Fri, 2007-09-21 at 09:03 +0100, Gregory Stark wrote: Mem: 32945280k total, 32871832k used,73448k free, 247432k buffers Swap: 1951888k total,42308k used, 1909580k free, 30294300k cached It seems to imply Linux is paging out sysV shared memory. In fact some of Heikki's tests here showed that Linux would do precisely that. But then why is it not reporting that in the Swap: used section ? It only reports 42308k used swap. I have a box where I just executed 3x a select count(*) from a table which has ~5.5 GB size on disk, and the count executed in 4 seconds, which I take as it is all cached (shared memory is set to 12GB - I use the box for testing for now, otherwise I would set it far lower because I have bad experience with setting it more than 1/3 of the available memory). Top reported at the end of the process: Mem: 16510724k total, 16425252k used,85472k free,10144k buffers Swap: 7815580k total, 157804k used, 7657776k free, 15980664k cached I also watched it during the selects, but it was not significantly different. So my only conclusion is that the reported cached value is either including the shared memory or is simply wrong... or I just don't get how linux handles memory. Cheers, Csaba. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Linux mis-reporting memory
On Fri, 2007-09-21 at 10:43 +0100, Gregory Stark wrote: The other possibility is that Postgres just hasn't even touched a large part of its shared buffers. But then how do you explain the example I gave, with a 5.5GB table seq-scanned 3 times, shared buffers set to 12 GB, and top still showing almost 100% memory as cached and no SWAP used ? In this case you can't say postgres didn't touch it's shared buffers - or a sequential scan won't use the shared buffers ? Cheers, Csaba. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] Searching for the cause of a bad plan
Hi all, Postgres version: 8.2.4 Tables: table_a(a bigint, b bigint, primary key(a, b) ); table_b1(b bigint primary key, more columns...); table_b2(b bigint primary key references table_b1(b), more columns...); table_b1: ~ 27M rows; ~25 more columns; width=309 (as reported by explain select *); table_a: ~400M rows; - column b should reference table_b1, but it does not for performance reasons (it is an insert only table); - column a distinct values: 1148 - has (a, b) as primary key; - has no additional columns; table_b1: ~40K rows; ~70 more columns; width=1788 (as reported by explain select *); Statistics for the involved columns for each table are attached in files (to preserve the spacing). They were taken after analyzing the relevant table (except for table_b2 where I added the fiddled statistics first and then remembered to analyze fresh, resulting in the non_fiddled version, which gives the same result as the fiddled one). The problem query is: prepare test_001(bigint) as SELECT tb.* FROM table_a ta JOIN table_b2 tb ON ta.b=tb.b WHERE ta.a = $1 ORDER BY ta.a, ta.b limit 10; Explain gives Plan 1 (see attached plans.txt) If I set enable_hashjoin=off and enable_mergejoin=off, I get Plan 2 (again, see plans.txt). The difference is a 30x improvement in the second case... (I actually forgot to account for cache effects, but later rerun the queries multiple times and the timings are proportional). Additionally, if I replace table_b2 with table_b1 in the query, I get Plan 3 (with reasonable execution time) with both enable_hashjoin and enable_mergejoin on. So there is something which makes table_b2 different from table_b1 for planning purposes, but I could not identify what that is... they have differences in statistics, but fiddling with the stats gave me no difference in the plan. Looking at Plan 2, it looks like the limit step is estimating wrongly it's cost. I guessed that it does that because it thinks the b values selected from table_a for a given a span a larger range than the b values in table_b2, because the b values in table_b2 are a (relatively small) subset of the b values in table_a. But this is not the case, the query only gets a values for which all the b values in table_a will be found in table_b2. Of course the planner has no way to know this, but then I think it is not the case, as I tried to copy the histogram statistics in pg_statistic for the column b from the entry for table_b1 (which contains the whole span of b values) to the entry for table_b2, with no change in the plan. Just for the record, this query is just a part of a more complex one, which joins in bigger tables, resulting in even worse performance, but I tracked it down to refusing the nested loop to be the problem. Is there anything I could do to convince the planner to use here the nested loop plan ? Thanks, Csaba. attname | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation -+---+---++-+-+--+- a | 0 | 8 | 1148 | {31826743,31855101,31855343,31854918,31856328,31861573,31855122,31855130,31855189,31856426} | {0.005,0.0047,0.0043,0.004,0.004,0.004,0.0037,0.0037,0.0037,0.0037} | {31734956,31854364,31854732,31855162,31855660,31857144,31858109,31858965,31859762,31860576,31861566} |0.999608 b | 0 | 8 | -1 | | | {63977,36878147,42247866,42548692,42812320,46992026,51444368,55977972,56607708,59496742,68530614} |0.602959 attname | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation
Re: [PERFORM] Searching for the cause of a bad plan
On Fri, 2007-09-21 at 11:59 +0100, Simon Riggs wrote: Please re-run everything on clean tables without frigging the stats. We need to be able to trust what is happening is normal. I did, the plan fiddling happened after getting the plans after a fresh analyze, and I did run the plan again with fresh analyze just before sending the mail and the plan was the same. In fact I spent almost 2 days playing with the query which is triggering this behavior, until I tracked it down to this join. Thing is that we have many queries which rely on this join, so it is fairly important that we understand what happens there. Plan2 sees that b1 is wider, which will require more heap blocks to be retrieved. It also sees b1 is less correlated than b2, so again will require more database blocks to retrieve. Try increasing effective_cache_size. effective_cach_size is set to ~2.7G, the box has 4G memory. I increased it now to 3,5G but it makes no difference. I increased it further to 4G, no difference again. Can you plans with/without LIMIT and with/without cursor, for both b1 and b2? The limit is unfortunately absolutely needed part of the query, it makes no sense to try without. If it would be acceptable to do it without the limit, then it is entirely possible that the plan I get now would be indeed better... but it is not acceptable. Thanks, Csaba. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Linux mis-reporting memory
On Fri, 2007-09-21 at 11:34 +0100, Heikki Linnakangas wrote: Which version of Postgres is this? In 8.3, a scan like that really won't suck it all into the shared buffer cache. For seq scans on tables larger than shared_buffers/4, it switches to the bulk read strategy, using only a few buffers, and choosing the starting point with the scan synchronization facility. This was on 8.1.9 installed via apt-get on Debian 4.1.1-21. In any case I'm pretty sure linux swaps shared buffers, as I always got worse performance for shared buffers more than about 1/3 of the memory. But in that case the output of top is misleading. Cheers, Csaba. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Searching for the cause of a bad plan
On Fri, 2007-09-21 at 12:34 +0100, Simon Riggs wrote: On Fri, 2007-09-21 at 13:29 +0200, Csaba Nagy wrote: Can you plans with/without LIMIT and with/without cursor, for both b1 and b2? The limit is unfortunately absolutely needed part of the query Understood, but not why I asked... Well, the same query without limit goes: dbdop=# explain execute test_001(31855344); QUERY PLAN Sort (cost=322831.85..322831.94 rows=36 width=1804) Sort Key: ta.a, ta.b - Hash Join (cost=3365.60..322830.92 rows=36 width=1804) Hash Cond: (ta.b = tb.b) - Index Scan using pk_table_a on table_a ta (cost=0.00..314541.78 rows=389648 width=16) Index Cond: (a = $1) - Hash (cost=524.71..524.71 rows=41671 width=1788) - Seq Scan on table_b2 tb (cost=0.00..524.71 rows=41671 width=1788) I'm not sure what you mean without cursor, maybe not using prepare ? Well we set up the JDBC driver to always prepare the queries, as this gives us much better worst case plans than when letting postgres see the parameter values, especially in queries with limit. So I simulate that when explaining the behavior we see. All our limit queries are for interactive display, so the worst case is of much higher importance for us than the mean execution time... unfortunately postgres has a tendency to take the best mean performance path than avoid worst case, and it is not easy to convince it otherwise. Cheers, Csaba. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Searching for the cause of a bad plan
[snip] Ok, I was not able to follow your explanation, it's too deep for me into what the planner does... Incidentally, the way out of this is to improve the stats by setting stats target = 1000 on column a of ta. That will allow the optimizer to have a better estimate of the tail of the distribution of a, which should then be more sensibly reflected in the cost of the Index Scan. That doesn't solve the actual problem, but should help in your case. OK, I can confirm that. I set the statistics target for column a on table_a to 1000, analyzed, and got the plan below. The only downside is that analyze became quite expensive on table_a, it took 15 minutes and touched half of the pages... I will experiment with lower settings, maybe it will work with less than 1000 too. db explain analyze execute test_001(31855344); QUERY PLAN -- Limit (cost=0.00..4499.10 rows=10 width=1804) (actual time=103.566..120.363 rows=2 loops=1) - Nested Loop (cost=0.00..344630.97 rows=766 width=1804) (actual time=103.563..120.359 rows=2 loops=1) - Index Scan using pk_table_a on table_a ta (cost=0.00..67097.97 rows=78772 width=16) (actual time=71.965..77.284 rows=2 loops=1) Index Cond: (a = $1) - Index Scan using pk_table_b2 on table_b2 tb (cost=0.00..3.51 rows=1 width=1788) (actual time=21.526..21.528 rows=1 loops=2) Index Cond: (ta.b = tb.b) Total runtime: 120.584 ms Thanks, Csaba. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Searching for the cause of a bad plan
OK, I can confirm that. I set the statistics target for column a on table_a to 1000, analyzed, and got the plan below. The only downside is that analyze became quite expensive on table_a, it took 15 minutes and touched half of the pages... I will experiment with lower settings, maybe it will work with less than 1000 too. So, just to finish this up: setting statistics to 100 worked too, and it has an acceptable impact on analyze. My original (more complicated) query is working fine now, with visible effects on server load... Thanks Simon for your help ! Cheers, Csaba. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] How to use a trigger to write rows to a remote server
On Wed, 2007-07-18 at 15:36, Michael Dengler wrote: Row X is inserted into TableX in DB1 on server1TableX trigger function fires and contacts DB2 on server2 and inserts the row into TableY on server2. This kind of problem is usually solved more robustly by inserting the change into a local table and let the remote server (or some external program) poll that periodically, and make the necessary changes to the remote server. This method does not have the problems Heikki mentions in his reply with disconnections and transaction rollbacks, as the external program/remote server will only see committed transactions and it can apply the accumulated changes after connection is recovered in case of failure, without blocking the activity on the master. This is also covered in a few past posts on the postgres lists (I guess you should look in the general list for that), in particular you could be interested in the possibility of notifications if you want your poller to be notified immediately when a change occurs. Cheers, Csaba. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Foreign Key Deadlocking
A frequently mentioned approach to avoid the point of contention is to have a totals record and have the triggers insert deltas records; to get the sum, add them all. Periodically, take the deltas and apply them to the totals. This is what we do here too. There is only one exception to this rule, in one case we actually need to have the inserted records and the updated parent in one transaction for data consistency, in that case the delta approach won't work... we didn't find any other solution to that except patching postgres not to lock the parent keys at all, which has it's own problems too (occasional breakage of the foreign key relationship when the parent is deleted and a child still slips in, but this is very rare in our case not to cause problems which cannot be cleaned up with relative ease - not to mention that there could be other problems we didn't discover yet or our usage patterns are avoiding). Cheers, Csaba. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Foreign Key Deadlocking
Can someone confirm that I've identified the right fix? I'm pretty sure that won't help you... see: http://archives.postgresql.org/pgsql-general/2006-12/msg00029.php The deadlock will be there if you update/insert the child table and update/insert the parent table in the same transaction (even if you update some other field on the parent table than the key referenced by the child table). If your transactions always update/insert only one of those tables, it won't deadlock (assuming you order the inserts/updates properly per PK). Cheers, Csaba. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Using the 8.2 autovacuum values with 8.1
On Thu, 2007-02-22 at 22:53, Mark Stosberg wrote: Thanks to everyone for the feedback about vacuuming. It's been very useful. The pointers to the pgstattuple and Pgfouine tools were also helpful. I'm now considering the following plan for trying Autovacuuming again with 8.1. I'd like any peer review you have to offer of the following: 1. First, I'll move the settings to match the defaults in 8.2. The ones I noticed in particular were: autovacuum_vacuum_threshold changes: 1000 - 500 autovacuum_anayze_threshold changes: 500 - 250 autovacuum_scale_factor changes: .4 - .2 autovacuum_analyze_scale_factor changes .2 - .1 2. Try the vacuum cost delay feature, starting with a 20ms value: autovacuum_vacuum_cost_delay = 20 3. Immediately add a row to pg_autovacuum for a huge logging table that would be too slow to vacuum usually. We'll still vacuum it once a week for good measure by cron. 4. For good measure, I think I still keep the nightly cron entry that does a complete vacuum analyze (except for that large table...). Seem like a reasonable plan? You likely don't need the nightly full vacuum run... we also do here a nightly vacuum beside autovacuum, but not a full one, only for tables which are big enough that we don't want autovacuum to touch them in high business time but they have enough change that we want a vacuum on them frequent enough. I discover them by checking the stats, for example: SELECT c.relname, c.reltuples::bigint as rowcnt, pg_stat_get_tuples_inserted(c.oid) AS inserted, pg_stat_get_tuples_updated(c.oid) AS updated, pg_stat_get_tuples_deleted(c.oid) AS deleted FROM pg_class c WHERE c.relkind = 'r'::char GROUP BY c.oid, c.relname, c.reltuples HAVING pg_stat_get_tuples_updated(c.oid) + pg_stat_get_tuples_deleted(c.oid) 1000 ORDER BY pg_stat_get_tuples_updated(c.oid) + pg_stat_get_tuples_deleted(c.oid) DESC; The top tables in this list for which the (deleted + updated) / rowcnt is relatively small but still significant need your attention for nightly vacuum... the rest is handled just fine by autovacuum. On the other end of the scale, if you have tables for which the deletion/update rate is way higher then the row count, that's likely a hot-spot table which you probably need extra vacuuming during the day. Cheers, Csaba. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] How long should it take to insert 200,000 records?
On Tue, 2007-02-06 at 01:35, Karen Hill wrote: [snip] So far I've been sitting here for about 2 million ms waiting for it to complete, and I'm not sure how many inserts postgres is doing per second. One way is to run analyze verbose on the target table and see how many pages it has, and then do it again 1 minute later and check how many pages it grew. Then multiply the page increase by the record per page ratio you can get from the same analyze's output, and you'll get an estimated growth rate. Of course this will only work if you didn't have lots of free space in the table to start with... if you do have lots of free space, you still can estimate the growth based on the analyze results, but it will be more complicated. In any case, it would be very nice to have more tools to attach to running queries and see how they are doing... starting with what exactly they are doing (are they in RI checks maybe ?), the actual execution plan they are using, how much they've done from their work... it would help a lot debugging performance problems. Cheers, Csaba. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Keeping processes open for re-use
On Thu, 2006-11-09 at 13:35, Hilary Forbes wrote: [snip] Is there a way that we can achieve this in Postgres? We have a situation whereby we have lots of web based users doing short quick queries and obviously the start up time for a process must add to their perceived response time. Yes: google for connection pooling. Note that different solutions exist for different programming languages, so you should look for connection pooling for the language you're using. HTH, Csaba. ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Hints proposal
2d) Hints will damage the ongoing development of the optimizer by reducing or eliminating test cases for its improvement. You have no evidence for this. The mindset of the postgres community you cite further below usually mandates that you say things if you have evidence for them... and this one could be even backwards, by putting such a tool in normal mortals hands that they can experiment with execution plans to see which one works better, thus giving more data to the developers than it is possible now. This is of course a speculation too, but not at all weaker than yours. 2e) Hints will divert developer resource away from ongoing development of the optimizer. This is undebatable, although the long term cost/benefit is not clear. And I would guess simple hinting would not need a genius to implement it as planner optimizations mostly do... so it could possibly be done by somebody else than the core planner hackers (is there any more of them than Tom ?), and such not detract them too much from the planner optimization tasks. 2f) Hints may demoralize the developer community - many of whom will have been attracted to Postgres precisely because this was a realm where crude solutions were discouraged. I still don't get it why are you so against hints. Hints are a crude solution only if you design them to be like that... otherwise they are just yet another tool to get the work done, preferably now. I understand that these points may seem a bit 'feel-good' and intangible - especially for the DBA's moving to Pg from Oracle, but I think they illustrate the mindset of the Postgres developer community, and the developer community is, after all - the primary reason why Pg is such a good product. I fail to see why would be a hinted postgres an inferior product... Of course - if we can find a way to define 'hint like' functionality that is more in keeping with the 'Postgres way' (e.g. some of the relation level statistical additions as discussed), then some of 2d-2f) need not apply. I bet most of the users who wanted hints are perfectly fine with any variations of it, if it solves the problems at hand. Cheers, Csaba. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Hints proposal
OK, I just have to comment... Jim C. Nasby [EMAIL PROTECTED] writes: These hints would outright force the planner to do things a certain way. ... FROM table /* ACCESS {SEQSCAN | [[NO] BITMAP] INDEX index_name} */ This proposal seems to deliberately ignore every point that has been made *against* doing things that way. It doesn't separate the hints from the queries, it doesn't focus on fixing the statistical or cost misestimates that are at the heart of the issue, and it takes no account of the problem of hints being obsoleted by system improvements. But whatever arguments you made about planner improvements and the like, it will NEVER be possible to correctly estimate in all cases the statistics for a query, even if you perfectly know WHAT statistics you need, which is also not the case all the time. Tom, you're the one who knows best how the planner works... can you bet anything you care about on the fact that one day the planner will never ever generate a catastrophic plan without DBA tweaking ? And how far in time we'll get to that point ? Until that point is achieved, the above proposal is one of the simplest to understand for the tweaking DBA, and the fastest to deploy when faced with catastrophic plans. And I would guess it is one of the simplest to be implemented and probably not very high maintenance either, although this is just a guess. If I could hint some of my queries, I would enable anonymous prepared statements to take into account the parameter values, but I can't because that results in runaway queries every now and then, so I had to force postgres generate generic queries without knowing anything about parameter values... so the effect for me is an overall slower postgres system because I couldn't fix the particular problems I had and had to tweak general settings. And when I have a problem I can't wait until the planner is fixed, I have to solve it immediately... the current means to do that are suboptimal. The argument that planner hints would hide problems from being solved is a fallacy. To put a hint in place almost the same amount of analysis is needed from the DBA as solving the problem now, so users who ask now for help will further do it even in the presence of hints. The ones who wouldn't are not coming for help now either, they know their way out of the problems... and the ones who still report a shortcoming of the planner will do it with hints too. I would even say it would be an added benefit, cause then you could really see how well a specific plan will do without having the planner capable to generate alone that plan... so knowledgeable users could come to you further down the road when they know where the planner is wrong, saving you time. I must say it again, this kind of query-level hinting would be the easiest to understand for the developers... there are many trial-end-error type of programmers out there, if you got a hint wrong, you fix it and move on, doesn't need to be perfect, it just have to be good enough. I heavily doubt that postgres will get bad publicity because user Joe sot himself in the foot by using bad hints... the probability for that is low, you must actively put those hints there, and if you take the time to do that then you're not the average Joe, and probably not so lazy either, and if you're putting random hints, then you would probably mess it up some other way anyway. And the thing about missing new features is also not very founded. If I would want to exclude a full table scan on a specific table for a specific query, than that's about for sure that I want to do that regardless what new features postgres will offer in the future. Picking one specific access method is more prone to missing new access methods, but even then, when I upgrade the DB server to a new version, I usually have enough other compatibility problems (till now I always had some on every upgrade I had) that making a round of upgrading hints is not an outstanding problem. And if the application works good enough with suboptimal plans, why would I even take that extra effort ? I guess the angle is: I, as a practicing DBA would like to be able to experiment and get most out of the imperfect tool I have, and you, the developers, want to make the tool perfect... I don't care about perfect tools, it just have to do the job... hints or anything else, if I can make it work GOOD ENOUGH, it's all fine. And hints is something I would understand and be able to use. Thanks for your patience if you're still reading this... Cheers, Csaba. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [PERFORM] Hints proposal
Hmmm, if you already understand Visual Basic syntax, should we support that too? Or maybe we should support MySQL's use of '-00-00' as the zero date because people understand that? You completely misunderstood me... I have no idea about oracle hints, never used Oracle in fact. My company uses oracle, but I have only very very limited contact with oracle issues, and never touched a hint. I'm only talking about ease of use, learning curves, and complexity in general. While I do like the idea of an all automatic system optimizer which takes your query portofolio and analyzes the data based on those queries and creates you all the indexes you need and all that, that's not gonna happen soon, because it's a very complex thing to implement. The alternative is that you take your query portofolio, analyze it yourself, figure out what statistics you need, create indexes, tweak queries, hint the planner for correlations and stuff... which is a complex task, and if you have to tell the server about some correlations with the phase of the moon, you're screwed cause there will never be any DB engine which will understand that. But you always can put the corresponding hint in the query when you know the correlation is there... The problem is that the application sometimes really knows better than the server, when the correlations are not standard. We're just not going to adopt a bad design because Oracle DBAs are used to it. If we wanted to do that, we could shut down the project and join a proprietary DB staff. I have really nothing to do with Oracle. I think you guys are simply too blinded by Oracle hate... I don't care about Oracle. The current discussion is: a) Planner tweaking is sometimes necessary; b) Oracle HINTS are a bad design for planner tweaking; While there are plenty of arguments you made against query level hints (can we not call them Oracle-hints ?), there are plenty of users of postgres who expressed they would like them. I guess they were tweaking postgres installations when they needed it, and not Oracle installations. I expressed it clearly that for me query level hinting would give more control and better understanding of what I have to do for the desired result. Perfect planning - forget it, I only care about good enough with reasonable tuning effort. If I have to tweak statistics I will NEVER be sure postgres will not backfire on me again. On the other hand if I say never do a seq scan on this table for this query, I could be sure it won't... c) Can we come up with a good design for planner tweaking? Angles again: good enough now is better for end users, but programmers always go for perfect tomorrow... pity. Cheers, Csaba. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Hints proposal
I'm not suggesting that we do that, but it seems better then embedding the hints in the queries themselves. OK, what about this: if I execute the same query from a web client, I want the not-so-optimal-but-safe plan, if I execute it asynchronously, I let the planner choose the best-overall-performance-but-sometimes-may-be-slow plan ? What kind of statistics/table level hinting will get you this ? I would say only query level hinting will buy you query level control. And that's perfectly good in some situations. I really can't see why a query-level hinting mechanism is so evil, why it couldn't be kept forever, and augmented with the possibility of correlation hinting, or table level hinting. These are really solving different problems, with some overlapping... Cheers, Csaba. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Unsubscribe
On Wed, 2006-10-04 at 18:02, Csaba Nagy wrote: If we didn't want to add it for each list we could just add a link here: http://www.postgresql.org/community/lists/subscribe OK, now that I had a second look on that page, it does contain unsubscription info... but it's well hidden for the fugitive look... the caption is a big Subscribe to Lists, you wouldn't think at a first glance think that the form is actually used to unsubscribe too, would you ? So maybe it's just that the text should be more explicit about what it actually does... Cheers, Csaba. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] slow queue-like empty table
On Thu, 2006-09-28 at 09:36, Tobias Brox wrote: [Tobias Brox - Thu at 08:56:31AM +0200] It really seems like some transaction is still viewing the queue, since it found 38k of non-removable rows ... but how do I find the pid of the transaction viewing the queue? As said, the pg_locks didn't give me any hints ... The open transaction doesn't have to have any locks on your queue table to prevent vacuuming dead rows. It's mere existence is enough... MVCC means that a still running transaction could still see those dead rows, and so VACUUM can't remove them until there's no transaction which started before they were deleted. So long running transactions are your enemy when it comes to high insert/delete rate queue tables. So you should check for idle in transaction sessions, those are bad... or any other long running transaction. Dropping the table and recreating it solved the immediate problem, but there must be some better solution than that? :-) If you must have long running transactions on your system (like vacuuming another big table - that also qualifies as a long running transaction, though this is fixed in 8.2), then you could use CLUSTER (see the docs), which is currently not MVCC conforming and deletes all the dead space regardless if any other running transaction can see it or not. This is only acceptable if you're application handles the queue table independently, not mixed in complex transactions. And the CLUSTER command takes an exclusive lock on the table, so it won't work for e.g. during a pg_dump, it would keep the queue table locked exclusively for the whole duration of the pg_dump (it won't be able to actually get the lock, but it will prevent any other activity on it, as it looks like in progress exclusive lock requests block any new shared lock request). HTH, Csaba. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] autovacuum on a -mostly- r/o table
On Wed, 2006-09-27 at 18:08, Edoardo Ceccarelli wrote: How can I configure the vacuum to run after the daily batch insert/update? Check out this: http://www.postgresql.org/docs/8.1/static/catalog-pg-autovacuum.html By inserting the right row you can disable autovacuum to vacuum your big tables, and then you can schedule vacuum nightly for those just as before. There's still a benefit in that you don't need to care about vacuuming the rest of the tables, which will be done just in time. Cheers, Csaba. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Optimizing DELETE
I've just fired off a DELETE FROM table command (i.e. unfiltered DELETE) on a trivially small table but with many foreign key references (on similar-sized tables), and I'm waiting for it to finish. It's been 10 minutes now, which seems very excessive for a table of 9000 rows on a 3 GHz desktop machine. If you have missing indexes on the child tables foreign keys, that might be a cause of slow delete. The cascading delete must look up the to be deleted rows in all child tables, which will do sequential scans if you don't have proper indexes. Try to do an explain analyze for deleting one row, that should also show you the time spent in triggers, which might clue you in what's taking so long. Cheers, Csaba. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Help tuning autovacuum - seeing lots of relationbloat
On Wed, 2006-06-21 at 17:27, jody brownell wrote: Our application is broken down quite well. We have two main writing processes writing to two separate sets of tables. No crossing over, nothign to prohibit the vacuuming in the nature which you describe. It really doesn't matter what table are you touching, as it doesn't matter if you read or write either, what matters is how long ago was the last begin without commit or rollback. VACUUM will not touch tuples which were deleted after the oldest not yet finished transaction started, regardless if that transaction touched the vacuumed table or not in any way... My longest transaction on the tables in question are typically quite short until of course they begin to bloat. Well, your application might be completely well behaved and still your DBA (or your favorite DB access tool for that matter) can leave open transactions in an interactive session. It never hurts to check if you actually have idle in transaction sessions. It happened a few times to us, some of those were bad coding on ad-hoc tools written by us, others were badly behaved DB access tools opening a transaction immediately after connect and after each successful command, effectively leaving an open transaction when leaving it open while having lunch... So it might very well be that some interactive or ad hoc tools you're using to manage the DB are your problem. Cheers, Csaba. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Help tuning autovacuum - seeing lots of relationbloat
On Wed, 2006-06-21 at 18:21, jody brownell wrote: That is interesting. There is one thread keeping a transaction open it appears from ps postgres: app app xxx(42644) idle in transaction That shouldn't be a problem on itself, idle in transaction happens all the time between 2 commands in the same transaction... you only have a problem if you see the same PID always idle, that means somebody left an open transaction and left for lunch. [snip] this was with the Idle in transaction though. This probably means you don't have long running transactions currently. However, if you happen to have just one such long transaction, the dead space accumulates and normal vacuum will not be able to clean that anymore. But I guess if you didn't find one now then you should take a look at Tom's suggestion and bump up debug level to see if autovacuum picks your table at all... Ah HA! Wondering, my autovacuum naptime is 60 seconds, that is also the interval which I wake up and begin persistence. Wondering if I am simply locking autovacuum out of the tables b/c they are on a similar timeline. I will try a 30 second naptime, if this is it, that should increase the likely hood of falling on the right side of the TX more often. make sense? I don't think that's your problem... vacuum wouldn't be locked out by any activity which doesn't lock exclusively the table (and I guess you're not doing that). If your persistence finishes quickly then that's not the problem. Oh, just occured to me... in order to use autovacuum you also need to enable the statistics collector on row level: stats_start_collector = on stats_row_level = on See also: http://www.postgresql.org/docs/8.1/static/maintenance.html#AUTOVACUUM This was not mentioned in the settings in your original post, so I guess you didn't touch that, and I think they are disabled by default. If this is disabled, you should enable it and pg_ctl reload , that should fix the problem. Cheers, Csaba. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Identical query on two machines, different plans....
You very likely forgot to run ANALYZE on your laptop after copying the data. Observe the different row count estimates in the 2 plans... HTH, Csaba. QUERY PLAN --- Index Scan using ticketing_codes_uq_value_group_id on ticketing_codes (cost=0.00..6.02 rows=1 width=4) (actual time=0.104..0.107 rows=1 ^^ loops=1) Index Cond: (((code_value)::text = 'C7ZP2U'::text) AND (code_group_id = 1000)) Total runtime: 0.148 ms (3 rows) PLAN -- Bitmap Heap Scan on ticketing_codes (cost=2.01..1102.05 rows=288 width=4) (actual time=88.164..88.170 rows=1 loops=1) Recheck Cond: (((code_value)::text = 'C7ZP2U'::text) AND (code_group_id = 1000)) - Bitmap Index Scan on ticketing_codes_uq_value_group_id (cost=0.00..2.01 rows=288 width=0) (actual time=54.397..54.397 rows=1 loops=1) Index Cond: (((code_value)::text = 'C7ZP2U'::text) AND (code_group_id = 1000)) Total runtime: 88.256 ms (5 rows) ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Identical query on two machines, different plans....
OK, I marked the wrong row counts, but the conclusion is the same. Cheers, Csaba. QUERY PLAN --- Index Scan using ticketing_codes_uq_value_group_id on ticketing_codes (cost=0.00..6.02 rows=1 width=4) (actual time=0.104..0.107 rows=1 ^^ loops=1) Index Cond: (((code_value)::text = 'C7ZP2U'::text) AND (code_group_id = 1000)) Total runtime: 0.148 ms (3 rows) PLAN -- Bitmap Heap Scan on ticketing_codes (cost=2.01..1102.05 rows=288 width=4) (actual time=88.164..88.170 rows=1 loops=1) Recheck Cond: (((code_value)::text = 'C7ZP2U'::text) AND (code_group_id = 1000)) - Bitmap Index Scan on ticketing_codes_uq_value_group_id (cost=0.00..2.01 rows=288 width=0) (actual time=54.397..54.397 rows=1 loops=1) Index Cond: (((code_value)::text = 'C7ZP2U'::text) AND (code_group_id = 1000)) Total runtime: 88.256 ms (5 rows) ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] WAL logging of SELECT ... INTO command
On Wed, 2006-03-22 at 16:35, Tom Lane wrote: Jim C. Nasby [EMAIL PROTECTED] writes: Oh, so in other words, SELECT * INTO temp FROM table is inherently non-deterministic at the physical level, so the only way to be able to allow PITR to work is to duplicate all the physical changes. Darn. Well, lemme put it this way: I'm not prepared to require that PG be deterministic at the physical level. One obvious source of non-determinancy is the FSM, which is likely to hand out different free space to different transactions depending on what else is going on at the same time. There are others, such as deliberately random tie-breaking during btree index insertion. While you're at talking about WAL and PITR... I see from the aboce discussion that PITR is already demanding special handling in the code (I hope I got this one right, as the following are based on this). What if the PITR logging would be disconnected from the WAL logging completely ? What I mean is to introduce a WAL subscription mechanism, which basically means some incoming connections where we stream the log records. We don't need to write them to disk at all in the normal case, I guess usually PITR will store the records on some other machine so it means network, not disk. And it doesn't need to be done synchronously, it can lag behind the running transactions, and we can do it in batches of WAL records. It also would mean that the local WAL does not need to log the things which are only needed for the PITR... that would likely mean some spared WAL disk activity. Of course it also would mean that the local WAL and PITR WAL are not the same, but that is not an issue I guess. It would also permit immediate recycling of the WAL files if the current archiving style is not used. The drawbacks I can see (please add yours): 1) the need for the subscription management code with the added complexity it implies; 2) problems if the WAL stream lags too much behind; 3) problems if the subscribed client's connection is interrupted; Nr. 2 could be solved by saving the PITR WAL separately if the lag grows over a threshold, and issue a warning. This could still be acceptable, as the writing doesn't have to be synchronous and can be made in relatively large blocks. There could be a second bigger lag threshold which completely cancels the subscription. All these thresholds should be configurable, as it depends on the application what's more important, to have the standby available all the time or have the primary faster if loaded... Nr. 3. can be solved by either canceling the subscription on connection drop, or by allowing a certain amount of time after which the subscription is canceled. The client can reconnect before this timeout expires. In the meantime the primary can store the PITR WAL on disk as mentioned above... Cheers, Csaba. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Migration study, step 1: bulk write performance
Did you try mounting ext3 whith data=writeback by chance? People have found that makes a big difference in performance. I'm not sure, there's other people here doing the OS stuff - I'm pretty much ignorant about what data=writeback could mean :-D They knew however that for the data partitions no FS journaling is needed, and for the WAL partition meta data journaling is enough, so I guess they tuned ext3 for this. Cheers, Csaba. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Migration study, step 1: bulk write performance
For the record, that's the wrong way round. For the data partitioning metadata journaling is enough, and for the WAL partition you don't need any FS journaling at all. Yes, you're right: the data partition shouldn't loose file creation, deletion, etc., which is not important for the WAL partition where the WAL files are mostly recycled... right ? Cheers, Csaba. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Migration study, step 1: bulk write performance
Mikael, I've just recently passed such an experience, i.e. migrating from another vendor to postgres of a DB about the same size category you have. I think you got it right with the fsync turned off during migration (just don't forget to turn it back after finishing ;-), and using tables without indexes/foreign keys. In our case recreating all the indexes/foreign keys/other constraints took actually longer than the raw data transfer itself... but it's possible that the process was not tuned 100%, we are still learning how to tune postgres... What I can add from our experience: ext3 turned out lousy for our application, and converting to XFS made a quite big improvement for our DB load. I don't have hard figures, but I think it was some 30% improvement in overall speed, and it had a huge improvement for heavy load times... what I mean is that with ext3 we had multiple parallel big tasks executing in more time than if we would have executed them sequentially, and with XFS that was gone, load scales linearly. In any case you should test the performance of your application on different FS and different settings, as this could make a huge difference. And another thing, we're still fighting with performance problems due to the fact that our application was designed to perform well with the other DB product... I think you'll have more work to do in this regard than just some search/replace ;-) Cheers, Csaba. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] neverending vacuum
So one very effective way of speeding this process up is giving the vacuum process lots of memory, because it will have to do fewer passes at each index. How much do you have? OK, this is my problem... it is left at default (16 megabyte ?). This must be a mistake in configuration, on other similar boxes I set this to 262144 (256 megabyte). The box has 4 Gbyte memory. Thanks for the explanation - you were right on the spot, it will likely solve the problem. Cheers, Csaba. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Can this query go faster???
Joost, Why do you use an offset here ? I guess you're traversing the table somehow, in this case it would be better to remember the last zipcode + housenumber and put an additional condition to get the next bigger than the last one you've got... that would go for the index on zipcode+housenumber and be very fast. The big offset forces postgres to traverse that many entries until it's able to pick the one row for the result... On Tue, 2005-12-06 at 10:43, Joost Kraaijeveld wrote: Hi, Is it possible to get this query run faster than it does now, by adding indexes, changing the query? SELECT customers.objectid FROM prototype.customers, prototype.addresses WHERE customers.contactaddress = addresses.objectid ORDER BY zipCode asc, housenumber asc LIMIT 1 OFFSET 283745 Explain: Limit (cost=90956.71..90956.71 rows=1 width=55) - Sort (cost=90247.34..91169.63 rows=368915 width=55) Sort Key: addresses.zipcode, addresses.housenumber - Hash Join (cost=14598.44..56135.75 rows=368915 width=55) Hash Cond: (outer.contactaddress = inner.objectid) - Seq Scan on customers (cost=0.00..31392.15 rows=368915 width=80) - Hash (cost=13675.15..13675.15 rows=369315 width=55) - Seq Scan on addresses (cost=0.00..13675.15 rows=369315 width=55) The customers table has an index on contactaddress and objectid. The addresses table has an index on zipcode+housenumber and objectid. TIA -- Groeten, Joost Kraaijeveld Askesis B.V. Molukkenstraat 14 6524NB Nijmegen tel: 024-3888063 / 06-51855277 fax: 024-3608416 e-mail: [EMAIL PROTECTED] web: www.askesis.nl ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Can this query go faster???
On Tue, 2005-12-06 at 13:20, Joost Kraaijeveld wrote: [snip] Ah, a misunderstanding: I only need to calculate an index if the user wants a record that is not in or adjacent to the cache (in which case I can do a select values last value in the cache. So I must always materialize all rows below the wanted index. In this case the query will very likely not work faster. It must always visit all the records till the required offset. If the plan should be faster using the index, then you probably need to analyze (I don't recall from your former posts if you did it recently or not), in any case you could check an explain analyze to see if the planner is mistaken or not - you might already know this. Cheers, Csaba. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] VERY slow after many updates
Alex, I suppose the table is a kind of 'queue' table, where you insert/get/delete continuously, and the life of the records is short. Considering that in postgres a delete will still leave you the record in the table's file and in the indexes, just mark it as dead, your table's actual size can grow quite a lot even if the number of live records will stay small (you will have a lot of dead tuples, the more tasks processed, the more dead tuples). So I guess you should vacuum this table very often, so that the dead tuples are reused. I'm not an expert on this, but it might be good to vacuum after each n deletions, where n is ~ half the average size of the queue you expect to have. From time to time you might want to do a vacuum full on it and a reindex. Right now I guess a vacuum full + reindex will help you. I think it's best to do: vacuum download_queue; vacuum full download_queue; reindex download_queue; I think the non-full vacuum which is less obtrusive than the full one will do at least some of the work and it will bring all needed things in FS cache, so the full vacuum to be as fast as possible (vacuum full locks exclusively the table). At least I do it this way with good results for small queue-like tables... BTW, I wonder if the download_queue_user_index index is helping you at all on that table ? Do you expect it to grow bigger than 1000 ? Otherwise it has no point to index it. HTH, Csaba. On Sat, 2005-11-19 at 08:46, Alex Wang wrote: I am using PostgreSQL in an embedded system which has only 32 or 64 MB RAM (run on PPC 266 MHz or ARM 266MHz CPU). I have a table to keep downlaod tasks. There is a daemon keep looking up the table and fork a new process to download data from internet. Daemon: . Check the table every 5 seconds . Fork a download process to download if there is new task Downlaod process (there are 5 download process max): . Update the download rate and downloaded size every 3 seconds. At begining, everything just fine. The speed is good. But after 24 hours, the speed to access database become very very slow. Even I stop all processes, restart PostgreSQL and use psql to select data, this speed is still very very slow (a SQL command takes more than 2 seconds). It is a small table. There are only 8 records in the table. The only way to solve it is remove all database, run initdb, create new database and insert new records. I tried to run vacummdb but still very slow. Any idea to make it faster? Thanks, Alex -- Here is the table schema: create table download_queue ( task_id SERIAL, username varchar(128), pid int, url text, filename varchar(1024), status int, created_time int, started_time int, total_size int8, current_size int8, current_rate int, CONSTRAINT download_queue_pkey PRIMARY KEY(task_id) ); CREATE INDEX download_queue_user_index ON download_queue USING BTREE (username); ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] VERY slow after many updates
Just for clarification, update is actually equal to delete+insert in Postgres. So if you update rows, it's the same as you would delete the row and insert a new version. So the table is bloating also in this situation. I think there is an added problem when you update, namely to get to a row, postgres will traverse all dead rows matching the criteria... so even if you have an index, getting 1 row which was updated 1 times will access 1 rows only to find 1 which is still alive. So in this case vacuuming should happen even more often, to eliminate the dead rows. And the index was probably only helping because the table was really bloated, so if you vacuum it often enough you will be better off without the index if the row count will stay low. Cheers, Csaba. On Sat, 2005-11-19 at 13:05, Alex Wang wrote: Hi Csaba, Thanks for your reply. Yes, it's a queue table. But I did not perform many insert/delete before it becomes slow. After insert 10 records, I just do get/update continuously. After 24 hour, the whole database become very slow (not only the download_queue table but other tables, too). But you are right. Full vacuum fixes the problem. Thank you very much! I expect there will be less than 1000 records in the table. The index does obvous improvement on SELECT task_id, username FROM download_queue WHERE username '%s' even there are only 100 records. Thanks, Alex - Original Message - From: Csaba Nagy [EMAIL PROTECTED] To: Alex Wang [EMAIL PROTECTED] Cc: postgres performance list pgsql-performance@postgresql.org Sent: Saturday, November 19, 2005 7:12 PM Subject: Re: [PERFORM] VERY slow after many updates Alex, I suppose the table is a kind of 'queue' table, where you insert/get/delete continuously, and the life of the records is short. Considering that in postgres a delete will still leave you the record in the table's file and in the indexes, just mark it as dead, your table's actual size can grow quite a lot even if the number of live records will stay small (you will have a lot of dead tuples, the more tasks processed, the more dead tuples). So I guess you should vacuum this table very often, so that the dead tuples are reused. I'm not an expert on this, but it might be good to vacuum after each n deletions, where n is ~ half the average size of the queue you expect to have. From time to time you might want to do a vacuum full on it and a reindex. Right now I guess a vacuum full + reindex will help you. I think it's best to do: vacuum download_queue; vacuum full download_queue; reindex download_queue; I think the non-full vacuum which is less obtrusive than the full one will do at least some of the work and it will bring all needed things in FS cache, so the full vacuum to be as fast as possible (vacuum full locks exclusively the table). At least I do it this way with good results for small queue-like tables... BTW, I wonder if the download_queue_user_index index is helping you at all on that table ? Do you expect it to grow bigger than 1000 ? Otherwise it has no point to index it. HTH, Csaba. On Sat, 2005-11-19 at 08:46, Alex Wang wrote: I am using PostgreSQL in an embedded system which has only 32 or 64 MB RAM (run on PPC 266 MHz or ARM 266MHz CPU). I have a table to keep downlaod tasks. There is a daemon keep looking up the table and fork a new process to download data from internet. Daemon: . Check the table every 5 seconds . Fork a download process to download if there is new task Downlaod process (there are 5 download process max): . Update the download rate and downloaded size every 3 seconds. At begining, everything just fine. The speed is good. But after 24 hours, the speed to access database become very very slow. Even I stop all processes, restart PostgreSQL and use psql to select data, this speed is still very very slow (a SQL command takes more than 2 seconds). It is a small table. There are only 8 records in the table. The only way to solve it is remove all database, run initdb, create new database and insert new records. I tried to run vacummdb but still very slow. Any idea to make it faster? Thanks, Alex -- Here is the table schema: create table download_queue ( task_id SERIAL, username varchar(128), pid int, url text, filename varchar(1024), status int, created_time int, started_time int, total_size int8, current_size int8, current_rate int, CONSTRAINT download_queue_pkey PRIMARY KEY(task_id) ); CREATE INDEX download_queue_user_index ON download_queue USING BTREE (username); -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. ---(end of broadcast
Re: [PERFORM] Deleting Records
Christian, Do you have foreign keys pointing to your table with ON CASCADE... ? Cause in that case you're not only deleting your 22000 records, but the whole tree of cascades. And if you don't have an index on one of those foreign keys, then you might have a sequential scan of the child table on each deleted row... I would check the foreign keys. HTH, Csaba. On Thu, 2005-10-20 at 10:43, Christian Paul B. Cosinas wrote: Hi! I'm experiencing a very slow deletion of records. Which I thin is not right. I have a Dual Xeon Server with 6gig Memory. I am only deleting about 22,000 records but it took me more than 1 hour to finish this. What could possibly I do so that I can make this fast? Here is the code inside my function: FOR temp_rec IN SELECT * FROM item_qc_doer LOOP DELETE FROM qc_session WHERE item_id = temp_rec.item_id; DELETE FROM item_qc_doer WHERE item_id = temp_rec.item_id; END LOOP; Item_qc_oder table contains 22,000 records. I choose Polesoft Lockspam to fight spam, and you? http://www.polesoft.com/refer.html ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Help tuning postgres
In the light of what you've explained below about nonremovable row versions reported by vacuum, I wonder if I should worry about the following type of report: INFO: vacuuming public.some_table INFO: some_table: removed 29598 row versions in 452 pages DETAIL: CPU 0.01s/0.04u sec elapsed 18.77 sec. INFO: some_table: found 29598 removable, 39684 nonremovable row versions in 851 pages DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages are entirely empty. CPU 0.02s/0.07u sec elapsed 23.16 sec. VACUUM Does that mean that 39684 nonremovable pages are actually the active live pages in the table (as it reports 0 dead) ? I'm sure I don't have any long running transaction, at least according to pg_stats_activity (backed by the linux ps too). Or I should run a vacuum full... This table is one of which has frequently updated rows. TIA, Csaba. On Thu, 2005-10-13 at 14:40, Andrew Sullivan wrote: On Thu, Oct 13, 2005 at 10:15:03AM +0200, Csaba Nagy wrote: OK, this sounds interesting, but I don't understand: why would an update chase down a lot of dead tuples ? Should I read up on some docs, cause I obviously don't know enough about how updates work on postgres... Right. Here's the issue: MVCC does not replace rows when you update. Instead, it marks the old row as expired, and sets the new values. The old row is still there, and it's available for other transactions who need to see it. As the docs say (see http://www.postgresql.org/docs/8.0/interactive/transaction-iso.html), In effect, a SELECT query sees a snapshot of the database as of the instant that that query begins to run. And that can be true because the original data is still there, although marked as expired for subsequent transactions. UPDATE works the same was as SELECT in terms of searching for rows (so does any command that searches for data). Now, when you select data, you actually have to traverse all the existing versions of the tuple in order to get the one that's live for you. This is normally not a problem: VACUUM goes around and cleans out old, expired data that is not live for _anyone_. It does this by looking for the oldest transaction that is open. (As far as I understand it, this is actually the oldest transaction in the entire back end; but I've never understood why that should the the case, and I'm too incompetent/dumb to understand the code, so I may be wrong on this point.) If you have very long-running transactions, then, you can end up with a lot of versions of dead tuples on the table, and so reading the few records you want can turn out actually to be a very expensive operation, even though it ought to be cheap. You can see this by using the VERBOSE option to VACUUM: test=# VACUUM VERBOSE eval1 ; INFO: vacuuming public.eval1 INFO: eval1: found 0 removable, 0 nonremovable row versions in 0 pages DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: vacuuming pg_toast.pg_toast_18831 INFO: index pg_toast_18831_index now contains 0 row versions in 1 pages DETAIL: 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: pg_toast_18831: found 0 removable, 0 nonremovable row versions in 0 pages DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. VACUUM Note those removable and nonremovable row versions. It's the unremovable ones that can hurt. WARNING: doing VACUUM on a big table on a disk that's already pegged is going to cause you performance pain, because it scans the whole table. In some cases, though, you have no choice: if the winds are already out of your sails, and you're effectively stopped, anything that might get you moving again is an improvement. And how would the analyze help in finding this out ? I thought it would only show me additionally the actual timings, not more detail in what was done... Yes, it shows the actual timings, and the actual number of rows. But if the estimates that the planner makes are wildly different than the actual results, then you know your statistics are wrong, and that the planner is going about things the wrong way. ANALYSE is a big help. There's also a verbose option to it, but it's usually less useful in production situations. A ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Help tuning postgres
First of all thanks all for the input. I probably can't afford even the reindex till Christmas, when we have about 2 weeks of company holiday... but I guess I'll have to do something until Christmas. The system should at least look like working all the time. I can have downtime, but only for short periods preferably less than 1 minute. The tables we're talking about have ~10 million rows the smaller ones and ~150 million rows the bigger ones, and I guess reindex will take quite some time. I wonder if I could device a scheme like: - create a temp table exactly like the production table, including indexes and foreign keys; - create triggers on the production table which log all inserts, deletes, updates to a log table; - activate these triggers; - copy all data from the production table to a temp table (this will take the bulk of the time needed for the whole operation); - replay the log on the temp table repeatedly if necessary, until the temp table is sufficiently close to the original; - rename the original table to something else, and then rename the temp table to the original name, all this in a transaction - this would be ideally the only visible delay for the user, and if the system is not busy, it should be quick I guess; - replay on more time the log; All this should happen in a point in time when there's little traffic to the data base. Replaying could be as simple as a few delete triggers on the log table, which replay the deleted record on the production table, and the replay then consisting in a delete operation on the log table. This is so that new log entries can be replayed later without replaying again what was already replayed. The big tables I should do this procedure on have low probability of conflicting operations (like insert and immediate delete of the same row, or multiple insert of the same row, multiple conflicting updates of the same row, etc.), this is why I think replaying the log will work fine... of course this whole set up will be a lot more work than just reindex... I wonder if somebody tried anything like this and if it has chances to work ? Thanks, Csaba. On Tue, 2005-10-18 at 17:18, Robert Treat wrote: reindex should be faster, since you're not dumping/reloading the table contents on top of rebuilding the index, you're just rebuilding the index. Robert Treat emdeon Practice Services Alachua, Florida On Wed, 2005-10-12 at 13:32, Steve Poe wrote: Would it not be faster to do a dump/reload of the table than reindex or is it about the same? Steve Poe On Wed, 2005-10-12 at 13:21 -0400, Tom Lane wrote: Emil Briggs [EMAIL PROTECTED] writes: Not yet, the db is in production use and I have to plan for a down-time for that... or is it not impacting the activity on the table ? It will cause some performance hit while you are doing it. It'll also lock out writes on the table until the index is rebuilt, so he does need to schedule downtime. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Help tuning postgres
[snip] Yes, but it could be a disk issue because you're doing more work than you need to. If your UPDATEs are chasing down a lot of dead tuples, for instance, you'll peg your I/O even though you ought to have I/O to burn. OK, this sounds interesting, but I don't understand: why would an update chase down a lot of dead tuples ? Should I read up on some docs, cause I obviously don't know enough about how updates work on postgres... And how would the analyze help in finding this out ? I thought it would only show me additionally the actual timings, not more detail in what was done... Thanks, Csaba. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Help tuning postgres
Thanks Andrew, this explanation about the dead rows was enlightening. Might be the reason for the slowdown I see on occasions, but not for the case which I was first observing. In that case the updated rows are different for each update. It is possible that each row has a few dead versions, but not too many, each row is updated just a limited number of times. However, we have other updates which access the same row 1000s of times (up to millions of times), and that could hurt if it's like you said, i.e. if each update has to crawl over all the dead rows... I have now autovacuum in place, and I'm sure it will kick in at ~ a few 1s of updates, but in the meantime it could get bad. In any case, I suppose that those disk pages should be in OS cache pretty soon and stay there, so I still don't understand why the disk usage is 100% in this case (with very low CPU activity, the CPUs are mostly waiting/idle)... the amount of actively used data is not that big. I'll try to vacuum through cron jobs the most exposed tables to this multiple-dead-row-versions symptom, cause autovacuum probably won't do it often enough. Let's see if it helps... Thanks, Csaba. On Thu, 2005-10-13 at 14:40, Andrew Sullivan wrote: On Thu, Oct 13, 2005 at 10:15:03AM +0200, Csaba Nagy wrote: OK, this sounds interesting, but I don't understand: why would an update chase down a lot of dead tuples ? Should I read up on some docs, cause I obviously don't know enough about how updates work on postgres... Right. Here's the issue: MVCC does not replace rows when you update. Instead, it marks the old row as expired, and sets the new values. The old row is still there, and it's available for other transactions who need to see it. As the docs say (see http://www.postgresql.org/docs/8.0/interactive/transaction-iso.html), In effect, a SELECT query sees a snapshot of the database as of the instant that that query begins to run. And that can be true because the original data is still there, although marked as expired for subsequent transactions. UPDATE works the same was as SELECT in terms of searching for rows (so does any command that searches for data). Now, when you select data, you actually have to traverse all the existing versions of the tuple in order to get the one that's live for you. This is normally not a problem: VACUUM goes around and cleans out old, expired data that is not live for _anyone_. It does this by looking for the oldest transaction that is open. (As far as I understand it, this is actually the oldest transaction in the entire back end; but I've never understood why that should the the case, and I'm too incompetent/dumb to understand the code, so I may be wrong on this point.) If you have very long-running transactions, then, you can end up with a lot of versions of dead tuples on the table, and so reading the few records you want can turn out actually to be a very expensive operation, even though it ought to be cheap. You can see this by using the VERBOSE option to VACUUM: test=# VACUUM VERBOSE eval1 ; INFO: vacuuming public.eval1 INFO: eval1: found 0 removable, 0 nonremovable row versions in 0 pages DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: vacuuming pg_toast.pg_toast_18831 INFO: index pg_toast_18831_index now contains 0 row versions in 1 pages DETAIL: 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: pg_toast_18831: found 0 removable, 0 nonremovable row versions in 0 pages DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. VACUUM Note those removable and nonremovable row versions. It's the unremovable ones that can hurt. WARNING: doing VACUUM on a big table on a disk that's already pegged is going to cause you performance pain, because it scans the whole table. In some cases, though, you have no choice: if the winds are already out of your sails, and you're effectively stopped, anything that might get you moving again is an improvement. And how would the analyze help in finding this out ? I thought it would only show me additionally the actual timings, not more detail in what was done... Yes, it shows the actual timings, and the actual number of rows. But if the estimates that the planner makes are wildly different than the actual results, then you know your statistics are wrong, and that the planner is going about things the wrong way. ANALYSE is a big help. There's also a verbose option to it, but it's usually less useful in production situations. A ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining
[PERFORM] Help tuning postgres
Hi all, After a long time of reading the general list it's time to subscribe to this one... We have adapted our application (originally written for oracle) to postgres, and switched part of our business to a postgres data base. The data base has in the main tables around 150 million rows, the whole data set takes ~ 30G after the initial migration. After ~ a month of usage that bloated to ~ 100G. We installed autovacuum after ~ 2 weeks. The main table is heavily updated during the active periods of usage, which is coming in bursts. Now Oracle on the same hardware has no problems handling it (the load), but postgres comes to a crawl. Examining the pg_stats_activity table I see the updates on the main table as being the biggest problem, they are very slow. The table has a few indexes on it, I wonder if they are updated too on an update ? The index fields are not changing. In any case, I can't explain why the updates are so much slower on postgres. Sorry for being fuzzy a bit, I spent quite some time figuring out what I can do and now I have to give up and ask for help. The machine running the DB is a debian linux, details: $ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 11 model name : Intel(R) Pentium(R) III CPU family 1266MHz stepping: 1 cpu MHz : 1263.122 cache size : 512 KB fdiv_bug: no hlt_bug : no f00f_bug: no coma_bug: no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse bogomips: 2490.36 processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 11 model name : Intel(R) Pentium(R) III CPU family 1266MHz stepping: 1 cpu MHz : 1263.122 cache size : 512 KB fdiv_bug: no hlt_bug : no f00f_bug: no coma_bug: no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse bogomips: 2514.94 $ uname -a Linux *** 2.6.12.3 #1 SMP Tue Oct 11 13:13:00 CEST 2005 i686 GNU/Linux $ cat /proc/meminfo MemTotal: 4091012 kB MemFree:118072 kB Buffers: 18464 kB Cached:3393436 kB SwapCached: 0 kB Active: 947508 kB Inactive: 2875644 kB HighTotal: 3211264 kB HighFree: 868 kB LowTotal: 879748 kB LowFree:117204 kB SwapTotal: 0 kB SwapFree:0 kB Dirty: 13252 kB Writeback: 0 kB Mapped: 829300 kB Slab:64632 kB CommitLimit: 2045504 kB Committed_AS: 1148064 kB PageTables: 75916 kB VmallocTotal: 114680 kB VmallocUsed:96 kB VmallocChunk: 114568 kB The disk used for the data is an external raid array, I don't know much about that right now except I think is some relatively fast IDE stuff. In any case the operations should be cache friendly, we don't scan over and over the big tables... The postgres server configuration is attached. I have looked in the postgres statistics tables, looks like most of the needed data is always cached, as in the most accessed tables the load/hit ratio is mostly something like 1/100, or at least 1/30. Is anything in the config I got very wrong for the given machine, or what else should I investigate further ? If I can't make this fly, the obvious solution will be to move back to Oracle, cash out the license and forget about postgres forever... TIA, Csaba. # - # PostgreSQL configuration file # - # # This file consists of lines of the form: # # name = value # # (The '=' is optional.) White space may be used. Comments are introduced # with '#' anywhere on a line. The complete list of option names and # allowed values can be found in the PostgreSQL documentation. The # commented-out settings shown in this file represent the default values. # # Please note that re-commenting a setting is NOT sufficient to revert it # to the default value, unless you restart the postmaster. # # Any option can also be given as a command line switch to the # postmaster, e.g. 'postmaster -c log_connections=on'. Some options # can be changed at run-time with the 'SET' SQL command. # # This file is read on postmaster startup and when the postmaster # receives a SIGHUP. If you edit the file on a running system, you have # to SIGHUP the postmaster for the changes to take effect, or use # pg_ctl reload. Some settings, such as listen_address, require # a postmaster shutdown and restart to take effect. #--- # FILE LOCATIONS #--- # The default values of these variables
Re: [PERFORM] Help tuning postgres
[snip] Have you tried reindexing your active tables? Not yet, the db is in production use and I have to plan for a down-time for that... or is it not impacting the activity on the table ? Emil ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] [JDBC] Performance problem using V3 protocol in jdbc driver
Barry, I have made a similar experience, moving a big Oracle data base to Postgres 8.03 on linux. The first impact was similar, huge performance problems. The main problem was bad planner choices. The cause in our case: bad parameter types in the jdbc set methods (I guess you use Java). For oracle we used the NUMERIC type to set primary keys, but the postgres id type used was BIGINT, and it just refused to use the index in this case. Imagine that kicking in on a 100 million rows table... a sequential scan started a few times a second, now that made the DB unusable. So we fixed the code that for oracle continues to use NUMERIC and for postgres it uses BIGINT, and that is very important on setNull calls too. One very useful tool was the following query: prepare ps as SELECT procpid, substring(current_query for 97), to_char((now()-query_start), 'HH24:MI:SS') as t FROM pg_stat_activity where current_query not like '%insufficient%' and current_query not like '%IDLE%' order by t desc; Then you just execute ps; in psql, and it will show you the queries which are already running for a while. Other problems were caused by complex queries, where more than 2 tables were joined. For oracle we were giving hints in the form of special comments, to point to the right index, right plan, but that's not an option for postgres (yet ?). So the fix in this case was to use explicit joins which do influence the postgres planner choices. This fixed another class of issues for us... Another problem: if you want to avoid worst-case plans, and do away with a generic plan for all cases, then you might force the usage of server side prepare statements in all cases. I had to do that, a lot of queries were performing very badly without this. Now maybe that could be solved by raising the statistics targets where needed, but in my case the generic plan was always good enough, by design. We rely on the DB picking a good generic plan in all cases. One typical example for us would be: a limit query which select 20 rows out of 100 million, with a where clause which actually selects 1 row out of it for the last chunk... it was going for an index scan, but on the wrong index. The right index would have selected that exactly 1 row, the wrong one had to cruise through a few million rows... the limit fooled the planner that it will get 20 rows quickly. Now when I forced the usage of a prepared statement, it went for the right index and all was good. I actually set this in our connection pool: ((PGConnection)connection).setPrepareThreshold(1); but it is possible to set/reset it on a statement level, I just didn't find any query I should to do it for yet... the DB is steady now. Another issue was that we've had some functional indexes on oracle returning null for uninteresting rows, to lower the index size. This is easier to implement on postgres using a partial index, which has a lot simpler syntax than the oracle hack, and it is easier to handle. The catch was that we needed to change the where clause compared to oracle so that postgres picks the partial index indeed. There are cases where the planner can't figure out that it can use the index, especially if you use prepared statements and one of the parameters is used in the index condition. In this case it is needed to add the proper restriction to the where clause to point postgres to use the partial index. Using partial indexes speeds up the inserts and updates on those tables, and could speed up some selects too. Hmmm... that's about what I recall now... beside the postgres admin stuff, have you analyzed your data after import ? I forgot to do that at first, and almost reverted again back to oracle... and then after a few days it was very clear that running the auto-vacuum daemon is also a must :-) And: for big data sets is important to tweak all performance settings in the config file, otherwise you get surprises. We've been running a smaller instance of the same code on postgres for quite a while before deciding to migrate a big one, and that was cruising along happily with the default settings, so the first time we needed to do optimizations was when using a data set with a lot of data in it... HTH, Csaba. On Wed, 2005-08-17 at 06:42, Barry Lind wrote: We just moved a large production instance of ours from Oracle to Postgres 8.0.3 on linux. When running on Oracle the machine hummed along using about 5% of the CPU easily handling the fairly constant load, after moving the data to Postgres the machine was pretty much maxed out on CPU and could no longer keep up with the transaction volume. On a hunch I switched the jdbc driver to using the V2 protocol and the load on the machine dropped down to what it was when using Oracle and everything was fine. Now obviously I have found a work around for the performance problem, but I really don’t want to rely on using the V2 protocol forever, and don’t want to have to recommend to our customers that they need