Robert Haas wrote:
Well, if there are more tables that need vacuuming than there are
workers available at any given time, there will be a delay.  We
probably don't keep track of that delay at present, but we could.

There are at least four interesting numbers to collect each time autovacuum runs:

1) This one, when was the threshold crossed. I believe one of the AV workers would have to pause periodically to update these if they're all busy doing work.
2) What time did the last autovacuum start at
3) How many dead rows were there at the point when it started
4) When did the last autovacuum end (currently the only value stored)

There may be a 5th piece of state I haven't looked at yet worth exposing/saving, something related to how much work was skipped by the partial vacuum logic introduced in 8.4. I haven't looked at that code enough to know which is the right metric to measure its effectiveness by, but I have tis gut feel it's eventually going to be critical for distinguishing between the various common types of vacuum-heavy workloads that show up.

All of these need to be stored in a system table/view, so that an admin can run a query to answer questions like:

-What is AV doing right now?
-How far behind is AV on tables it needs to clean but hasn't even started on?
-How long is the average AV taking on my big tables?
-As I change the AV parameters, what does it do to the runtimes against my big tables?

As someone who is found by a lot of people whose problems revolve around databases with heavy writes or update churn, limitations in the current state of tracking what autovacuum does have been moving way up my priority list the last year. I now have someone who is always running autovacuum on the same table, 24x7. It finishes every two days, and when it does the 20% threshold is already crossed for it to start again. The "wait until a worker was available" problem isn't there, but I need a good wasy to track all of the other three things to have a hope of improving their situation. Right now getting the data I could use takes parsing log file output and periodic dumps of pg_stat_user_tables, then stitching the whole mess together.

You can't run a heavily updated database in the TB+ range and make sense of what autovacuum is doing without a large effort matching output from log_autovacuum_min_duration and the stats that are visible in pg_stat_user_tables. It must get easier than that to support the sort of bigger tables it's possible to build now. And if this data starts getting tracked, we can start to move toward AV parameters that are actually aiming at real-world units, too.

--
Greg Smith   2ndQuadrant US    g...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to