I've been thinnking how to improve autovacuum so that we can convince
more people that it can be enabled by default. Here are my thoughts.
There are two areas of improvements:
1. scheduling, and
2. process handling, i.e., how to have multiple vacuum processes running
at any time.
I ripped out the part about having multiple "vacuum queues", as it was
incomplete and it was also getting too complex. We need to discuss how
to do that, because it's a fundamental part of this proposal; the idea
is to be able to have several vacuums running at any time, but we need
to find a way to specify a policy for it.
My idea here is to morph the current autovacuum daemon from an agent
that itself runs a vacuum command, into something that launches other
processes to run those vacuum commands. I'll call this "the autovacuum
launcher process", or the launcher for short. The idea here is that the
launcher can take care of the scheduling while the worker processes do
their work. If the launcher then determines that a particular instant
there should be two vacuums running, then it simply starts two worker
The launcher would be running continuously, akin to the postmaster, but
would be obviously under control of the latter, so it's postmaster's
responsability to start and stop the launcher. The launcher would be
connected to shared memory, so it can scan system catalogs to load the
schedule (stored in system catalogs) into memory. If the launcher dies,
the postmaster should treat it like any other process' crash and cause a
The workers would not be postmaster's direct children, which could be a
problem. I'm open to ideas here, but I don't like using the postmaster
directly as a launcher, because of the shmem connection, which would
take robustness away from the postmaster. One idea to solve this is to
have the launcher process communicate child process IDs to the
postmaster, so that when it (the postmaster) wants to stop, it has those
additional PIDs in its process list and can signal them to stop. The
launcher process would also signal when it detects that one of the
workers stopped, and the postmaster would remove that process from the
list. This communication could be made to happen via named pipes, and
since the messages are so simple, there's no reliability concern for the
postmaster; it's very easy to verify that a message is correct by
checking whether the process is actually killable by kill(0).
Another idea that I discarded was to have the launcher communicate back
to the postmaster when new workers should be started. My fear is that
this type of communication (a lot more complex that just sending a PID)
could be a cause for postmaster instability.
We introduce the following concepts:
1. table groups. We'll have a system catalog for storing OID and group
name, and another catalog for membership, linking relid to group OID.
2. interval groups. We'll have a catalog for storing igroup name and
OID, and another catalog for membership. We identify an interval by:
- month of year
- day of month
- day of week
- start time of day
- end time of day
This is modelled after crontabs.
Additionally, we'll have another catalog on which we'll store table
groups to interval groups relationships. On that catalog we'll also
store those autovacuum settings that we want to be able to override:
whether to disable it for this interval group, or the values for the
So the scheduler, at startup, loads the whole schedule in memory, and
then wakes up at reasonable intervals and checks whether these equations
hold for some of the tables it's monitoring. If they do, then launch a
new worker process to do the job.
We need a mechanism for having the scheduler rescan the schedule when a
user modifies the catalog -- maybe having a trigger that sends a signal
to the process is good enough (implementation detail: the signal must be
routed via the postmaster, since the backend cannot hope to know the
scheduler's PID. This is easy enough to do.)
Alvaro Herrera Developer, http://www.PostgreSQL.org/
"The problem with the facetime model is not just that it's demoralizing, but
that the people pretending to work interrupt the ones actually working."
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings