I attach a patch to fix some issues in the autovac process handling
code. Mainly, instead of rechecking periodically in the launcher
whether a worker was able to start or not, what we do is signal it from
the postmaster when it fails.
In order to do this I had to make the postmaster set a flag in shared
memory, much like the pmsignal.c code does. According to previous
discussions this is an acceptable thing for postmaster to do.
I must admit I did not try to fill shared memory with garbage and see if
it caused a postmaster crash. What I did test was creating a database,
making sure that the launcher was picking it up, and then delete the
database's directory. This correctly causes a worker to log errors when
trying to connect, and it is handled well -- i.e. other databases are
not starved. (One remaining problem I see is that a database in this
state, or similar, that fails to be vacuumed, *will* cause starvation if
in danger of Xid wraparound).
Another thing I noticed is that I can change the "autovacuum" parameter
in postgresql.conf and send SIGHUP to postmaster, which correctly
starts the launcher if it was previously disabled. To make the shutdown
case act accordingly, I made the launcher check the "start daemon" flag
after rereading the config file, and quit if it's off. I tried multiple
combinations of having it set and unset, changing
stats_collect_tuple_level and it works fine.
One problem with the patch is this (new code):
bn = (Backend *) malloc(sizeof(Backend));
! if (bn)
{
! bn->pid = StartAutoVacWorker();
! bn->is_autovacuum = true;
! /* we don't need a cancel key */
! if (bn->pid > 0)
! {
! /* FIXME -- unchecked memory allocation here */
! DLAddHead(BackendList, DLNewElem(bn));
If the palloc() inside DLNewElem fails, we will fail to report a "fork
failure" to the launcher. I am not sure how serious this is. One idea
that came to mind was using a PG_TRY block, sending the signal in the
CATCH block, and then rethrowing the exception. Is this acceptable?
All in all, this patch increases the reliability of autovacuum, so I
intend to apply it shortly unless there are objections. Further
improvements might come later as I become aware of possible fixes.
--
Alvaro Herrera http://www.amazon.com/gp/registry/DXLWNGRJD34J
"Hay quien adquiere la mala costumbre de ser infeliz" (M. A. Evans)
Index: src/backend/postmaster/autovacuum.c
===================================================================
RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/postmaster/autovacuum.c,v
retrieving revision 1.50
diff -c -p -r1.50 autovacuum.c
*** src/backend/postmaster/autovacuum.c 13 Jun 2007 21:24:55 -0000 1.50
--- src/backend/postmaster/autovacuum.c 19 Jun 2007 17:30:00 -0000
***************
*** 4,16 ****
*
* PostgreSQL Integrated Autovacuum Daemon
*
*
* Portions Copyright (c) 1996-2007, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
*
*
* IDENTIFICATION
! * $PostgreSQL: pgsql/src/backend/postmaster/autovacuum.c,v 1.50 2007-06-13 21:24:55 alvherre Exp $
*
*-------------------------------------------------------------------------
*/
--- 4,52 ----
*
* PostgreSQL Integrated Autovacuum Daemon
*
+ * The autovacuum system is structured in two different kinds of processes: the
+ * autovacuum launcher and the autovacuum worker. The launcher is an
+ * always-running process, started by the postmaster when the autovacuum GUC
+ * parameter is set. The launcher schedules autovacuum workers to be started
+ * when appropriate. The workers are the processes which execute the actual
+ * vacuuming; they connect to a database as determined in the launcher, and
+ * once connected they examine the catalogs to select the tables to vacuum.
+ *
+ * The autovacuum launcher cannot start the worker processes by itself,
+ * because doing so would cause robustness issues (namely, failure to shut
+ * them down on exceptional conditions; and apart from that, since the launcher
+ * is connected to shared memory and is thus subject to corruption there, it is
+ * not as robust as the postmaster which is not). So it leaves that task to
+ * the postmaster.
+ *
+ * There is an autovacuum shared memory area, where the launcher stores
+ * information about the database it wants vacuumed. When it wants a new
+ * worker to start, it sets a flag in shared memory and sends a special signal
+ * to the postmaster. Then postmaster knows nothing more than it must start a
+ * worker; so it forks a new child, which turns into a worker and examines
+ * shared memory to know what database it must connect to.
+ *
+ * If the fork() call fails, the postmaster sets a flag in the shared memory
+ * area, and it sends a signal to the launcher. The launcher, upon noticing
+ * the flag, can try starting the worker again. Note that the failure can only
+ * be transient (fork failure due to high load, memory pressure, too many
+ * processes, etc); more permanent problems, like failure to connect to a
+ * database, are detected later in the worker and dealt with in a different
+ * way.
+ *
+ * When the worker is done vacuuming it sends SIGUSR1 to the launcher. The
+ * launcher then wakes up and is able to launch another worker, if the schedule
+ * is too tight. Otherwise it will go back to sleeping and awake normally
+ * according to schedule. At this time the launcher can also balance the
+ * settings for the various remaining workers' cost-based vacuum delay feature.
+ *
*
* Portions Copyright (c) 1996-2007, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
*
*
* IDENTIFICATION
! * $PostgreSQL: pgsql/src/backend/postmaster/autovacuum.c,v 1.50 2007-06-13 21:24:55 alvherre Exp $
*
*-------------------------------------------------------------------------
*/
*************** typedef struct WorkerInfoData
*** 158,185 ****
typedef struct WorkerInfoData *WorkerInfo;
/*-------------
* The main autovacuum shmem struct. On shared memory we store this main
* struct and the array of WorkerInfo structs. This struct keeps:
*
* av_launcherpid the PID of the autovacuum launcher
* av_freeWorkers the WorkerInfo freelist
* av_runningWorkers the WorkerInfo non-free queue
* av_startingWorker pointer to WorkerInfo currently being started (cleared by
* the worker itself as soon as it's up and running)
- * av_rebalance true when a worker determines that cost limits must be
- * rebalanced
*
! * This struct is protected by AutovacuumLock.
*-------------
*/
typedef struct
{
pid_t av_launcherpid;
SHMEM_OFFSET av_freeWorkers;
SHM_QUEUE av_runningWorkers;
SHMEM_OFFSET av_startingWorker;
- bool av_rebalance;
} AutoVacuumShmemStruct;
static AutoVacuumShmemStruct *AutoVacuumShmem;
--- 194,233 ----
typedef struct WorkerInfoData *WorkerInfo;
+ /*
+ * Possible signals received by the launcher from remote processes. These are
+ * stored atomically in shared memory so that other processes can set them
+ * without locking.
+ */
+ typedef enum
+ {
+ AutoVacForkFailed, /* failed trying to start a worker */
+ AutoVacRebalance, /* rebalance the cost limits */
+ AutoVacNumSignals = AutoVacRebalance /* must be last */
+ } AutoVacuumSignal;
+
/*-------------
* The main autovacuum shmem struct. On shared memory we store this main
* struct and the array of WorkerInfo structs. This struct keeps:
*
+ * av_signal set by other processes to indicate various conditions
* av_launcherpid the PID of the autovacuum launcher
* av_freeWorkers the WorkerInfo freelist
* av_runningWorkers the WorkerInfo non-free queue
* av_startingWorker pointer to WorkerInfo currently being started (cleared by
* the worker itself as soon as it's up and running)
*
! * This struct is protected by AutovacuumLock, except for av_signal and parts
! * of the worker list (see above).
*-------------
*/
typedef struct
{
+ sig_atomic_t av_signal[AutoVacNumSignals];
pid_t av_launcherpid;
SHMEM_OFFSET av_freeWorkers;
SHM_QUEUE av_runningWorkers;
SHMEM_OFFSET av_startingWorker;
} AutoVacuumShmemStruct;
static AutoVacuumShmemStruct *AutoVacuumShmem;
*************** StartAutoVacLauncher(void)
*** 316,344 ****
/*
* Main loop for the autovacuum launcher process.
- *
- * The signalling between launcher and worker is as follows:
- *
- * When the worker has finished starting up, it stores its PID in wi_workerpid
- * and sends a SIGUSR1 signal to the launcher. The launcher then knows that
- * the postmaster is ready to start a new worker. We do it this way because
- * otherwise we risk calling SendPostmasterSignal() when the postmaster hasn't
- * yet processed the last one, in which case the second signal would be lost.
- * This is only useful when two workers need to be started close to one
- * another, which should be rare but it's possible.
- *
- * When a worker exits, it resets the WorkerInfo struct and puts it back into
- * the free list. If there is no free worker slot, it will also signal the
- * launcher, which then wakes up and can launch a new worker if it needs to.
- * Note that we only need to do it when there's no free worker slot, because
- * otherwise there is no need -- the launcher would be awakened normally per
- * schedule.
- *
- * There is a potential problem if, for some reason, a worker starts and is not
- * able to bootstrap itself correctly. To prevent this situation from starving
- * the whole system, the launcher checks the launch time of the "starting
- * worker". If it's too old (older than autovacuum_naptime seconds), it resets
- * the worker entry and puts it back into the free list.
*/
NON_EXEC_STATIC void
AutoVacLauncherMain(int argc, char *argv[])
--- 364,369 ----
*************** AutoVacLauncherMain(int argc, char *argv
*** 547,552 ****
--- 572,581 ----
got_SIGHUP = false;
ProcessConfigFile(PGC_SIGHUP);
+ /* shutdown requested in config file */
+ if (!autovacuum_start_daemon)
+ break;
+
/* rebalance in case the default cost parameters changed */
LWLockAcquire(AutovacuumLock, LW_EXCLUSIVE);
autovac_balance_cost();
*************** AutoVacLauncherMain(int argc, char *argv
*** 556,640 ****
rebuild_database_list(InvalidOid);
}
! /* a worker started up or finished */
if (got_SIGUSR1)
{
got_SIGUSR1 = false;
/* rebalance cost limits, if needed */
! if (AutoVacuumShmem->av_rebalance)
{
LWLockAcquire(AutovacuumLock, LW_EXCLUSIVE);
! AutoVacuumShmem->av_rebalance = false;
autovac_balance_cost();
LWLockRelease(AutovacuumLock);
}
- }
-
- /*
- * There are some conditions that we need to check before trying to
- * start a launcher. First, we need to make sure that there is a
- * launcher slot available. Second, we need to make sure that no other
- * worker is still starting up.
- */
-
- LWLockAcquire(AutovacuumLock, LW_SHARED);
-
- can_launch = (AutoVacuumShmem->av_freeWorkers != INVALID_OFFSET);
-
- if (can_launch && AutoVacuumShmem->av_startingWorker != INVALID_OFFSET)
- {
- WorkerInfo worker = (WorkerInfo) MAKE_PTR(AutoVacuumShmem->av_startingWorker);
! if (current_time == 0)
! current_time = GetCurrentTimestamp();
!
! /*
! * We can't launch another worker when another one is still
! * starting up, so just sleep for a bit more; that worker will wake
! * us up again as soon as it's ready. We will only wait
! * autovacuum_naptime seconds for this to happen however. Note
! * that failure to connect to a particular database is not a
! * problem here, because the worker removes itself from the
! * startingWorker pointer before trying to connect; only low-level
! * problems, like fork() failure, can get us here.
! */
! if (TimestampDifferenceExceeds(worker->wi_launchtime, current_time,
! autovacuum_naptime * 1000))
! {
! LWLockRelease(AutovacuumLock);
! LWLockAcquire(AutovacuumLock, LW_EXCLUSIVE);
! /*
! * No other process can put a worker in starting mode, so if
! * startingWorker is still INVALID after exchanging our lock,
! * we assume it's the same one we saw above (so we don't
! * recheck the launch time).
! */
! if (AutoVacuumShmem->av_startingWorker != INVALID_OFFSET)
! {
! worker = (WorkerInfo) MAKE_PTR(AutoVacuumShmem->av_startingWorker);
! worker->wi_dboid = InvalidOid;
! worker->wi_tableoid = InvalidOid;
! worker->wi_workerpid = 0;
! worker->wi_launchtime = 0;
! worker->wi_links.next = AutoVacuumShmem->av_freeWorkers;
! AutoVacuumShmem->av_freeWorkers = MAKE_OFFSET(worker);
! AutoVacuumShmem->av_startingWorker = INVALID_OFFSET;
! }
! }
! else
{
/*
! * maybe the postmaster neglected this start signal --
! * resend it. Note: the constraints in
! * launcher_determine_sleep keep us from delivering signals too
! * quickly (at most once every 100ms).
*/
SendPostmasterSignal(PMSIGNAL_START_AUTOVAC_WORKER);
! can_launch = false;
}
}
! LWLockRelease(AutovacuumLock); /* either shared or exclusive */
if (can_launch)
{
--- 585,633 ----
rebuild_database_list(InvalidOid);
}
! /* a worker finished */
if (got_SIGUSR1)
{
got_SIGUSR1 = false;
/* rebalance cost limits, if needed */
! if (AutoVacuumShmem->av_signal[AutoVacRebalance])
{
LWLockAcquire(AutovacuumLock, LW_EXCLUSIVE);
! AutoVacuumShmem->av_signal[AutoVacRebalance] = false;
autovac_balance_cost();
LWLockRelease(AutovacuumLock);
}
! if (AutoVacuumShmem->av_signal[AutoVacForkFailed])
{
/*
! * If the postmaster failed to start a new worker, we sleep
! * for a little while and resend the signal. The new worker's
! * state is still in memory, so this is sufficient. After
! * that, we restart the main loop.
! *
! * XXX should we put a limit to the number of times we retry?
! * I don't think it makes much sense, because a future start
! * of a worker will continue to fail in the same way.
*/
+ AutoVacuumShmem->av_signal[AutoVacForkFailed] = false;
+ pg_usleep(100000L); /* 100ms */
SendPostmasterSignal(PMSIGNAL_START_AUTOVAC_WORKER);
! continue;
}
}
!
! /*
! * We cannot launch a worker if there are not free worker slots, or if
! * there is already a worker starting up.
! */
! can_launch = true;
! LWLockAcquire(AutovacuumLock, LW_SHARED);
! if (AutoVacuumShmem->av_freeWorkers == INVALID_OFFSET ||
! AutoVacuumShmem->av_startingWorker != INVALID_OFFSET)
! can_launch = false;
! LWLockRelease(AutovacuumLock);
if (can_launch)
{
*************** launch_worker(TimestampTz now)
*** 1197,1202 ****
--- 1190,1206 ----
}
}
+ /*
+ * Called from postmaster to signal a failure to fork a process to become
+ * worker. The postmaster should kill(SIGUSR1) the launcher shortly
+ * after calling this function.
+ */
+ void
+ AutoVacWorkerFailed(void)
+ {
+ AutoVacuumShmem->av_signal[AutoVacForkFailed] = true;
+ }
+
/* SIGHUP: set flag to re-read config file at next convenient time */
static void
avl_sighup_handler(SIGNAL_ARGS)
*************** AutoVacWorkerMain(int argc, char *argv[]
*** 1435,1442 ****
LWLockAcquire(AutovacuumLock, LW_EXCLUSIVE);
/*
! * beware of startingWorker being INVALID; this could happen if the
! * launcher thinks we've taking too long to start.
*/
if (AutoVacuumShmem->av_startingWorker != INVALID_OFFSET)
{
--- 1439,1446 ----
LWLockAcquire(AutovacuumLock, LW_EXCLUSIVE);
/*
! * beware of startingWorker being INVALID; this should not happen but keep
! * the check for safety
*/
if (AutoVacuumShmem->av_startingWorker != INVALID_OFFSET)
{
*************** AutoVacWorkerMain(int argc, char *argv[]
*** 1463,1468 ****
--- 1467,1473 ----
else
{
/* no worker entry for me, go away */
+ elog(WARNING, "autovacuum worker started without a worker entry");
dbid = InvalidOid;
LWLockRelease(AutovacuumLock);
}
*************** FreeWorkerInfo(int code, Datum arg)
*** 1525,1543 ****
LWLockAcquire(AutovacuumLock, LW_EXCLUSIVE);
/*
! * If this worker shuts down when there is no free worker slot, wake
! * the launcher up so that he can launch a new worker immediately if
! * required. We only save the launcher's PID in local memory here --
! * the actual signal will be sent when the PGPROC is recycled, because
! * that is when the new worker can actually be launched.
*
* We somewhat ignore the risk that the launcher changes its PID
* between we reading it and the actual kill; we expect ProcKill to be
* called shortly after us, and we assume that PIDs are not reused too
* quickly after a process exits.
*/
! if (AutoVacuumShmem->av_freeWorkers == INVALID_OFFSET)
! AutovacuumLauncherPid = AutoVacuumShmem->av_launcherpid;
SHMQueueDelete(&MyWorkerInfo->wi_links);
MyWorkerInfo->wi_links.next = AutoVacuumShmem->av_freeWorkers;
--- 1530,1547 ----
LWLockAcquire(AutovacuumLock, LW_EXCLUSIVE);
/*
! * Wake the launcher up so that he can launch a new worker immediately
! * if required. We only save the launcher's PID in local memory here;
! * the actual signal will be sent when the PGPROC is recycled. Note
! * that we always to this, so that the launcher can rebalance the cost
! * limit setting of the remaining workers.
*
* We somewhat ignore the risk that the launcher changes its PID
* between we reading it and the actual kill; we expect ProcKill to be
* called shortly after us, and we assume that PIDs are not reused too
* quickly after a process exits.
*/
! AutovacuumLauncherPid = AutoVacuumShmem->av_launcherpid;
SHMQueueDelete(&MyWorkerInfo->wi_links);
MyWorkerInfo->wi_links.next = AutoVacuumShmem->av_freeWorkers;
*************** FreeWorkerInfo(int code, Datum arg)
*** 1556,1562 ****
* now that we're inactive, cause a rebalancing of the surviving
* workers
*/
! AutoVacuumShmem->av_rebalance = true;
LWLockRelease(AutovacuumLock);
}
}
--- 1560,1566 ----
* now that we're inactive, cause a rebalancing of the surviving
* workers
*/
! AutoVacuumShmem->av_signal[AutoVacRebalance] = true;
LWLockRelease(AutovacuumLock);
}
}
Index: src/backend/postmaster/postmaster.c
===================================================================
RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/postmaster/postmaster.c,v
retrieving revision 1.527
diff -c -p -r1.527 postmaster.c
*** src/backend/postmaster/postmaster.c 22 Mar 2007 19:53:30 -0000 1.527
--- src/backend/postmaster/postmaster.c 18 Jun 2007 21:00:56 -0000
*************** StartAutovacuumWorker(void)
*** 3830,3861 ****
return;
bn = (Backend *) malloc(sizeof(Backend));
! if (!bn)
{
! ereport(LOG,
! (errcode(ERRCODE_OUT_OF_MEMORY),
! errmsg("out of memory")));
! return;
! }
!
! bn->pid = StartAutoVacWorker();
! bn->is_autovacuum = true;
! /* we don't need a cancel key */
! if (bn->pid > 0)
! {
! DLAddHead(BackendList, DLNewElem(bn));
#ifdef EXEC_BACKEND
! ShmemBackendArrayAdd(bn);
#endif
! }
! else
! {
! /* not much we can do */
! ereport(LOG,
! (errmsg("could not fork new process for autovacuum: %m")));
free(bn);
}
}
/*
--- 3830,3863 ----
return;
bn = (Backend *) malloc(sizeof(Backend));
! if (bn)
{
! bn->pid = StartAutoVacWorker();
! bn->is_autovacuum = true;
! /* we don't need a cancel key */
! if (bn->pid > 0)
! {
! /* FIXME -- unchecked memory allocation here */
! DLAddHead(BackendList, DLNewElem(bn));
#ifdef EXEC_BACKEND
! ShmemBackendArrayAdd(bn);
#endif
! /* all OK */
! return;
! }
!
! /* failed, fall through to report */
! /* XXX the reason of the failure is not reported??? */
free(bn);
}
+ else
+ elog(LOG, "out of memory");
+
+ /* report the failure to the launcher */
+ AutoVacWorkerFailed();
+ if (AutoVacPID != 0)
+ kill(AutoVacPID, SIGUSR1);
}
/*
Index: src/include/postmaster/autovacuum.h
===================================================================
RCS file: /home/alvherre/Code/cvs/pgsql/src/include/postmaster/autovacuum.h,v
retrieving revision 1.10
diff -c -p -r1.10 autovacuum.h
*** src/include/postmaster/autovacuum.h 18 Apr 2007 16:44:18 -0000 1.10
--- src/include/postmaster/autovacuum.h 18 Jun 2007 18:37:30 -0000
*************** extern bool IsAutoVacuumWorkerProcess(vo
*** 42,47 ****
--- 42,49 ----
extern void autovac_init(void);
extern int StartAutoVacLauncher(void);
extern int StartAutoVacWorker(void);
+ /* called from postmaster when a worker could not be forked */
+ extern void AutoVacWorkerFailed(void);
/* autovacuum cost-delay balancer */
extern void AutoVacuumUpdateDelay(void);
---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at
http://www.postgresql.org/about/donate