Ok, scratch that :-) Another round of braindumping below.
Launcher starts one worker in each database. This worker is not going
to do vacuum work, just report how much vacuum effort is needed in the
database. "Vacuum effort" is measured as the total number of pages in
need of vacuum, being the sum of relpages of all tables and indexes
needing vacuum. (Note: we weight heap pages the same as index pages.
Is this OK?)
Create a plan for vacuuming all those databases within the constraints
of max_workers. Databases needing the most work are vacuumed first.
One worker per database. Thus max_workers databases are being vacuumed
in parallel at this time. When one database is finished, the launcher
starts a worker in the next database in the list.
When the plan is complete (i.e. the list is empty) we can do the whole
thing again, excluding the databases that are still being vacuumed.
Perhaps we should wait autovacuum_naptime seconds between finishing one
vacuum round in all databases and starting the next. How do we measure
this: do we start sleeping when the last worker finishes, or when the
list is empty?
Perhaps we should reserve a worker for vacuuming hot tables. Launcher
then uses max_workers-1 workers for the above plan, and the spare worker
is continuously connecting to one database, vacuuming hot tables, going
away, the launcher starts it again to connect to the next database.
Definitional problem: how to decide what's a hot table? One idea (the
simplest) is to let the DBA define it.
Thus, at most two workers are on any database: one of them is working on
normal tables, the other on hot tables.
(This idea can be complemented by having another GUC var,
autovacuum_hot_workers, which allows the DBA to have more than one
worker on hot tables (just for the case where there are too many hot
tables). This may be overkill.)
Ron Mayer expressed the thought that we're complicating needlessly the
UI for vacuum_delay, naptime, etc. He proposes that instead of having
cost_delay etc, we have a mbytes_per_second parameter of some sort.
This strikes me a good idea, but I think we could make that after this
proposal is implemented. So this "take 2" could be implemented, and
then we could switch the cost_delay stuff to using a MB/s kind of
measurement somehow (he says waving his hands wildly).
Greg Stark and Matthew O'Connor say that we're misdirected in having
more than one worker per tablespace. I say we're not :-) If we
consider Ron Mayer's idea of measuring MB/s, but we do it per
tablespace, then we would inflict the correct amount of vacuum pain to
each tablespace, sleeping as appropriate. I think this would require
workers of different databases to communicate what tablespaces they are
using, so that all of them can utilize the correct amount of bandwidth.
I'd like to know if this responds to the mentioned people's objections.
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at