Alvaro Herrera <[EMAIL PROTECTED]> writes:
> Here is a second attempt at autovacuum integration.
A few comments:
* I strongly disagree with keeping updatable state in a catalog. In the
first place, that will cause autovac itself to create vacuumable trash.
In the second place, that means you can't modify the internal state kept
by autovacuum without forcing initdb; which is not a good situation,
considering how much change this code is likely to go through.
I think all the updatable state should be kept in the pgstats file.
(Memo to self: let's add a version number header to the pgstats file,
so that we can change its format without requiring an initdb to do so.)
pg_autovacuum should only contain user-settable parameters --- which is
still putting us at nontrivial risk for initdb constraints, but it's way
better than internal state. If you do that, then pg_autovacuum need
only contain entries for tables that have been given non-default
parameter settings (ie, those for which the user wants to override
global settings given in postgresql.conf).
* I'm pretty dubious about adding a syscache for pg_autovacuum.
Where exactly are the repeated fetches going to come from to justify
* The signal processing needs re-work; you can't just randomly assign
signal numbers, you need to keep these things largely consistent with
the other subprocesses. In particular, if this thing is going to be
running transactions then it MUST honor SIGALRM (eg, for deadlock
timeout checks) and SIGUSR1 (sinval catchup interrupt), and I don't
think you get to ignore SIGTERM either (may get this from init).
I think it's a pretty bad idea to use SIGUSR2 for something other than
what regular backends use it for, too. (Consider the prospect that for
some reason your PID occurs in pg_listener.) It'd be a good idea to
honor SIGINT with the normal meaning (query cancel, ie, abort
transaction, which in this context would also imply process shutdown)
and use that to shut down the daemon at postmaster stop. In short,
the signal handling had better look a whole lot more like a normal
* Speaking of shutdown, you can't stop the bgwriter until you know
that the autovac daemon is dead. In this respect too, it's much more
like a backend than like any of the other support processes. Signal it
when you signal the backends, and don't advance to the bgwriter kill
phase until autovac and all the backends are known gone.
* I see you have an autovac_init function to "annoy the user", but
shouldn't this be checked every time we are about to spawn an autovac
* I don't see any special checks for shared catalogs, which means they
are probably going to be over-vacuumed; or possibly under-vacuumed if
you fail to track the update stats for them in a single place rather
than in each database. It'd probably be a good idea to nominate just
one database to be the place from which shared catalogs are vacuumed;
or treat them as a completely separate special case.
* I have no objection to adding extra entry points to vacuum.c to
simplify the calls to it.
* With respect to comments like:
+ * We just skip a table not found on the stat hash. This is
+ * because whatever we do here, there won't be good statistics
+ * next time around, so we would do this same action again. It is
+ * tempting to issue an ANALYZE, but it won't work because ANALYZE
+ * doesn't update the stat system.
ISTM the entire point of "autovac integration" is to make sure the rest
of the system does what's needed to support autovac. If ANALYZE needs
to send something to the stats system, make it do so. (ISTM it'd be
reasonable for ANALYZE to make and report an estimate of the dead tuple
count, which after all is what you are hoping all the rest of the stats
machinery will derive for you...)
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?