Re: [PERFORM] Help tuning postgres
[snip] Yes, but it could be a disk issue because you're doing more work than you need to. If your UPDATEs are chasing down a lot of dead tuples, for instance, you'll peg your I/O even though you ought to have I/O to burn. OK, this sounds interesting, but I don't understand: why would an update chase down a lot of dead tuples ? Should I read up on some docs, cause I obviously don't know enough about how updates work on postgres... And how would the analyze help in finding this out ? I thought it would only show me additionally the actual timings, not more detail in what was done... Thanks, Csaba. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Help tuning postgres
[EMAIL PROTECTED] wrote: Have you tried reindexing your active tables? It will cause some performance hit while you are doing it. It sounds like something is bloating rapidly on your system and the indexes is one possible place that could be happening. You might consider using contrib/oid2name to monitor physical growth of tables and indexes. There have been some issues with bloat in PostgreSQL versions prior to 8.0, however there might still be some issues under certain circumstances even now, so it does pay to cast an eye on what's going on. If you haven't run vaccum regularly, this might lead to regular vacuums not reclaiming enough dead tuples in one go, so if you've had quite a lot of UPDATE/DELETE activity going onin the past and only just started to use pg_autovacuum after the DB has been in production for quite a while, you might indeed have to run a VACUUM FULL and/or REINDEX on the affected tables, both of which will more or less lock out any client access to the tables als long as they're running. Kind regards Markus ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Optimizer misconfigured ?
Nörder-Tuitje wrote: Hello, I have a strange effect on upcoming structure : People will be wanting the output of EXPLAIN ANALYSE on that query. They'll also ask whether you've VACUUMed, ANALYSEd and configured your postgresql.conf correctly. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Help tuning postgres
Thanks Andrew, this explanation about the dead rows was enlightening. Might be the reason for the slowdown I see on occasions, but not for the case which I was first observing. In that case the updated rows are different for each update. It is possible that each row has a few dead versions, but not too many, each row is updated just a limited number of times. However, we have other updates which access the same row 1000s of times (up to millions of times), and that could hurt if it's like you said, i.e. if each update has to crawl over all the dead rows... I have now autovacuum in place, and I'm sure it will kick in at ~ a few 1s of updates, but in the meantime it could get bad. In any case, I suppose that those disk pages should be in OS cache pretty soon and stay there, so I still don't understand why the disk usage is 100% in this case (with very low CPU activity, the CPUs are mostly waiting/idle)... the amount of actively used data is not that big. I'll try to vacuum through cron jobs the most exposed tables to this multiple-dead-row-versions symptom, cause autovacuum probably won't do it often enough. Let's see if it helps... Thanks, Csaba. On Thu, 2005-10-13 at 14:40, Andrew Sullivan wrote: On Thu, Oct 13, 2005 at 10:15:03AM +0200, Csaba Nagy wrote: OK, this sounds interesting, but I don't understand: why would an update chase down a lot of dead tuples ? Should I read up on some docs, cause I obviously don't know enough about how updates work on postgres... Right. Here's the issue: MVCC does not replace rows when you update. Instead, it marks the old row as expired, and sets the new values. The old row is still there, and it's available for other transactions who need to see it. As the docs say (see http://www.postgresql.org/docs/8.0/interactive/transaction-iso.html), In effect, a SELECT query sees a snapshot of the database as of the instant that that query begins to run. And that can be true because the original data is still there, although marked as expired for subsequent transactions. UPDATE works the same was as SELECT in terms of searching for rows (so does any command that searches for data). Now, when you select data, you actually have to traverse all the existing versions of the tuple in order to get the one that's live for you. This is normally not a problem: VACUUM goes around and cleans out old, expired data that is not live for _anyone_. It does this by looking for the oldest transaction that is open. (As far as I understand it, this is actually the oldest transaction in the entire back end; but I've never understood why that should the the case, and I'm too incompetent/dumb to understand the code, so I may be wrong on this point.) If you have very long-running transactions, then, you can end up with a lot of versions of dead tuples on the table, and so reading the few records you want can turn out actually to be a very expensive operation, even though it ought to be cheap. You can see this by using the VERBOSE option to VACUUM: test=# VACUUM VERBOSE eval1 ; INFO: vacuuming public.eval1 INFO: eval1: found 0 removable, 0 nonremovable row versions in 0 pages DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: vacuuming pg_toast.pg_toast_18831 INFO: index pg_toast_18831_index now contains 0 row versions in 1 pages DETAIL: 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: pg_toast_18831: found 0 removable, 0 nonremovable row versions in 0 pages DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. VACUUM Note those removable and nonremovable row versions. It's the unremovable ones that can hurt. WARNING: doing VACUUM on a big table on a disk that's already pegged is going to cause you performance pain, because it scans the whole table. In some cases, though, you have no choice: if the winds are already out of your sails, and you're effectively stopped, anything that might get you moving again is an improvement. And how would the analyze help in finding this out ? I thought it would only show me additionally the actual timings, not more detail in what was done... Yes, it shows the actual timings, and the actual number of rows. But if the estimates that the planner makes are wildly different than the actual results, then you know your statistics are wrong, and that the planner is going about things the wrong way. ANALYSE is a big help. There's also a verbose option to it, but it's usually less useful in production situations. A ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining
Re: [PERFORM] Help tuning postgres
On Thu, Oct 13, 2005 at 03:14:44PM +0200, Csaba Nagy wrote: In any case, I suppose that those disk pages should be in OS cache pretty soon and stay there, so I still don't understand why the disk usage is 100% in this case (with very low CPU activity, the CPUs are mostly waiting/idle)... the amount of actively used data is not that big. Ah, but if the sum of all the dead rows is large enough that they start causing your shared memory (== Postgres buffers) to thrash, then you start causing the memory subsystem to thrash on the box, which means less RAM is available for disk buffers because the OS is doing more work; and the disk buffers are full of a lot of garbage _anyway_, so then you may find that you're ending up hitting the disk for some of these reads after all. Around the office I have called this the buffer death spiral. And note that once you've managed to get into a vacuum-starvation case, your free space map might be exceeded, at which point your database performance really won't recover until you've done VACUUM FULL (prior to 7.4 there's also an index problem that's even worse, and that needs occasional REINDEX to solve; I forget which version you said you were using). The painful part about tuning a production system is really that you have to keep about 50 variables juggling in your head, just so you can uncover the one thing that you have to put your finger on to make it all play nice. A -- Andrew Sullivan | [EMAIL PROTECTED] A certain description of men are for getting out of debt, yet are against all taxes for raising money to pay it off. --Alexander Hamilton ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] slow update
Patrick Hatcher [EMAIL PROTECTED] writes: Pg 7.4.5 Trying to do a update of fields on 23M row database. Is it normal for this process to take 16hrs and still clocking? Are there foreign keys pointing at the table being updated? If so, failure to index the referencing columns could create this sort of performance problem. Also, in 7.4 you'd better be sure the referencing columns are the same datatype as the referenced column. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] slow update
Thanks. No foreign keys and I've been bitten by the mismatch datatypes and checked that before sending out the message :) Patrick Hatcher Development Manager Analytics/MIO Macys.com Tom Lane [EMAIL PROTECTED] s To Patrick Hatcher 10/13/2005 11:34 [EMAIL PROTECTED] AM cc postgres performance list pgsql-performance@postgresql.org Subject Re: [PERFORM] slow update Patrick Hatcher [EMAIL PROTECTED] writes: Pg 7.4.5 Trying to do a update of fields on 23M row database. Is it normal for this process to take 16hrs and still clocking? Are there foreign keys pointing at the table being updated? If so, failure to index the referencing columns could create this sort of performance problem. Also, in 7.4 you'd better be sure the referencing columns are the same datatype as the referenced column. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org