Thanks Andrew, this explanation about the dead rows was enlightening. Might be the reason for the slowdown I see on occasions, but not for the case which I was first observing. In that case the updated rows are different for each update. It is possible that each row has a few dead versions, but not too many, each row is updated just a limited number of times.
However, we have other updates which access the same row 1000s of times (up to millions of times), and that could hurt if it's like you said, i.e. if each update has to crawl over all the dead rows... I have now autovacuum in place, and I'm sure it will kick in at ~ a few 10000s of updates, but in the meantime it could get bad. In any case, I suppose that those disk pages should be in OS cache pretty soon and stay there, so I still don't understand why the disk usage is 100% in this case (with very low CPU activity, the CPUs are mostly waiting/idle)... the amount of actively used data is not that big. I'll try to vacuum through cron jobs the most exposed tables to this multiple-dead-row-versions symptom, cause autovacuum probably won't do it often enough. Let's see if it helps... Thanks, Csaba. On Thu, 2005-10-13 at 14:40, Andrew Sullivan wrote: > On Thu, Oct 13, 2005 at 10:15:03AM +0200, Csaba Nagy wrote: > > > > OK, this sounds interesting, but I don't understand: why would an update > > "chase down a lot of dead tuples" ? Should I read up on some docs, cause > > I obviously don't know enough about how updates work on postgres... > > Right. Here's the issue: > > MVCC does not replace rows when you update. Instead, it marks the > old row as expired, and sets the new values. The old row is still > there, and it's available for other transactions who need to see it. > As the docs say (see > <http://www.postgresql.org/docs/8.0/interactive/transaction-iso.html>), > "In effect, a SELECT query sees a snapshot of the database as of the > instant that that query begins to run." And that can be true because > the original data is still there, although marked as expired for > subsequent transactions. > > UPDATE works the same was as SELECT in terms of searching for rows > (so does any command that searches for data). > > Now, when you select data, you actually have to traverse all the > existing versions of the tuple in order to get the one that's live > for you. This is normally not a problem: VACUUM goes around and > cleans out old, expired data that is not live for _anyone_. It does > this by looking for the oldest transaction that is open. (As far as > I understand it, this is actually the oldest transaction in the > entire back end; but I've never understood why that should the the > case, and I'm too incompetent/dumb to understand the code, so I may > be wrong on this point.) If you have very long-running transactions, > then, you can end up with a lot of versions of dead tuples on the > table, and so reading the few records you want can turn out actually > to be a very expensive operation, even though it ought to be cheap. > > You can see this by using the VERBOSE option to VACUUM: > > test=# VACUUM VERBOSE eval1 ; > INFO: vacuuming "public.eval1" > INFO: "eval1": found 0 removable, 0 nonremovable row versions in 0 > pages > DETAIL: 0 dead row versions cannot be removed yet. > There were 0 unused item pointers. > 0 pages are entirely empty. > CPU 0.00s/0.00u sec elapsed 0.00 sec. > INFO: vacuuming "pg_toast.pg_toast_18831" > INFO: index "pg_toast_18831_index" now contains 0 row versions in 1 > pages > DETAIL: 0 index pages have been deleted, 0 are currently reusable. > CPU 0.00s/0.00u sec elapsed 0.00 sec. > INFO: "pg_toast_18831": found 0 removable, 0 nonremovable row > versions in 0 pages > DETAIL: 0 dead row versions cannot be removed yet. > There were 0 unused item pointers. > 0 pages are entirely empty. > CPU 0.00s/0.00u sec elapsed 0.00 sec. > VACUUM > > Note those "removable" and "nonremovable" row versions. It's the > unremovable ones that can hurt. WARNING: doing VACUUM on a big table > on a disk that's already pegged is going to cause you performance > pain, because it scans the whole table. In some cases, though, you > have no choice: if the winds are already out of your sails, and > you're effectively stopped, anything that might get you moving again > is an improvement. > > > And how would the analyze help in finding this out ? I thought it would > > only show me additionally the actual timings, not more detail in what > > was done... > > Yes, it shows the actual timings, and the actual number of rows. But > if the estimates that the planner makes are wildly different than the > actual results, then you know your statistics are wrong, and that the > planner is going about things the wrong way. ANALYSE is a big help. > There's also a verbose option to it, but it's usually less useful in > production situations. > > A ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match