I attach a patch to fix some issues in the autovac process handling code. Mainly, instead of rechecking periodically in the launcher whether a worker was able to start or not, what we do is signal it from the postmaster when it fails.
In order to do this I had to make the postmaster set a flag in shared memory, much like the pmsignal.c code does. According to previous discussions this is an acceptable thing for postmaster to do. I must admit I did not try to fill shared memory with garbage and see if it caused a postmaster crash. What I did test was creating a database, making sure that the launcher was picking it up, and then delete the database's directory. This correctly causes a worker to log errors when trying to connect, and it is handled well -- i.e. other databases are not starved. (One remaining problem I see is that a database in this state, or similar, that fails to be vacuumed, *will* cause starvation if in danger of Xid wraparound). Another thing I noticed is that I can change the "autovacuum" parameter in postgresql.conf and send SIGHUP to postmaster, which correctly starts the launcher if it was previously disabled. To make the shutdown case act accordingly, I made the launcher check the "start daemon" flag after rereading the config file, and quit if it's off. I tried multiple combinations of having it set and unset, changing stats_collect_tuple_level and it works fine. One problem with the patch is this (new code): bn = (Backend *) malloc(sizeof(Backend)); ! if (bn) { ! bn->pid = StartAutoVacWorker(); ! bn->is_autovacuum = true; ! /* we don't need a cancel key */ ! if (bn->pid > 0) ! { ! /* FIXME -- unchecked memory allocation here */ ! DLAddHead(BackendList, DLNewElem(bn)); If the palloc() inside DLNewElem fails, we will fail to report a "fork failure" to the launcher. I am not sure how serious this is. One idea that came to mind was using a PG_TRY block, sending the signal in the CATCH block, and then rethrowing the exception. Is this acceptable? All in all, this patch increases the reliability of autovacuum, so I intend to apply it shortly unless there are objections. Further improvements might come later as I become aware of possible fixes. -- Alvaro Herrera http://www.amazon.com/gp/registry/DXLWNGRJD34J "Hay quien adquiere la mala costumbre de ser infeliz" (M. A. Evans)
Index: src/backend/postmaster/autovacuum.c =================================================================== RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/postmaster/autovacuum.c,v retrieving revision 1.50 diff -c -p -r1.50 autovacuum.c *** src/backend/postmaster/autovacuum.c 13 Jun 2007 21:24:55 -0000 1.50 --- src/backend/postmaster/autovacuum.c 19 Jun 2007 17:30:00 -0000 *************** *** 4,16 **** * * PostgreSQL Integrated Autovacuum Daemon * * * Portions Copyright (c) 1996-2007, PostgreSQL Global Development Group * Portions Copyright (c) 1994, Regents of the University of California * * * IDENTIFICATION ! * $PostgreSQL: pgsql/src/backend/postmaster/autovacuum.c,v 1.50 2007-06-13 21:24:55 alvherre Exp $ * *------------------------------------------------------------------------- */ --- 4,52 ---- * * PostgreSQL Integrated Autovacuum Daemon * + * The autovacuum system is structured in two different kinds of processes: the + * autovacuum launcher and the autovacuum worker. The launcher is an + * always-running process, started by the postmaster when the autovacuum GUC + * parameter is set. The launcher schedules autovacuum workers to be started + * when appropriate. The workers are the processes which execute the actual + * vacuuming; they connect to a database as determined in the launcher, and + * once connected they examine the catalogs to select the tables to vacuum. + * + * The autovacuum launcher cannot start the worker processes by itself, + * because doing so would cause robustness issues (namely, failure to shut + * them down on exceptional conditions; and apart from that, since the launcher + * is connected to shared memory and is thus subject to corruption there, it is + * not as robust as the postmaster which is not). So it leaves that task to + * the postmaster. + * + * There is an autovacuum shared memory area, where the launcher stores + * information about the database it wants vacuumed. When it wants a new + * worker to start, it sets a flag in shared memory and sends a special signal + * to the postmaster. Then postmaster knows nothing more than it must start a + * worker; so it forks a new child, which turns into a worker and examines + * shared memory to know what database it must connect to. + * + * If the fork() call fails, the postmaster sets a flag in the shared memory + * area, and it sends a signal to the launcher. The launcher, upon noticing + * the flag, can try starting the worker again. Note that the failure can only + * be transient (fork failure due to high load, memory pressure, too many + * processes, etc); more permanent problems, like failure to connect to a + * database, are detected later in the worker and dealt with in a different + * way. + * + * When the worker is done vacuuming it sends SIGUSR1 to the launcher. The + * launcher then wakes up and is able to launch another worker, if the schedule + * is too tight. Otherwise it will go back to sleeping and awake normally + * according to schedule. At this time the launcher can also balance the + * settings for the various remaining workers' cost-based vacuum delay feature. + * * * Portions Copyright (c) 1996-2007, PostgreSQL Global Development Group * Portions Copyright (c) 1994, Regents of the University of California * * * IDENTIFICATION ! * $PostgreSQL: pgsql/src/backend/postmaster/autovacuum.c,v 1.50 2007-06-13 21:24:55 alvherre Exp $ * *------------------------------------------------------------------------- */ *************** typedef struct WorkerInfoData *** 158,185 **** typedef struct WorkerInfoData *WorkerInfo; /*------------- * The main autovacuum shmem struct. On shared memory we store this main * struct and the array of WorkerInfo structs. This struct keeps: * * av_launcherpid the PID of the autovacuum launcher * av_freeWorkers the WorkerInfo freelist * av_runningWorkers the WorkerInfo non-free queue * av_startingWorker pointer to WorkerInfo currently being started (cleared by * the worker itself as soon as it's up and running) - * av_rebalance true when a worker determines that cost limits must be - * rebalanced * ! * This struct is protected by AutovacuumLock. *------------- */ typedef struct { pid_t av_launcherpid; SHMEM_OFFSET av_freeWorkers; SHM_QUEUE av_runningWorkers; SHMEM_OFFSET av_startingWorker; - bool av_rebalance; } AutoVacuumShmemStruct; static AutoVacuumShmemStruct *AutoVacuumShmem; --- 194,233 ---- typedef struct WorkerInfoData *WorkerInfo; + /* + * Possible signals received by the launcher from remote processes. These are + * stored atomically in shared memory so that other processes can set them + * without locking. + */ + typedef enum + { + AutoVacForkFailed, /* failed trying to start a worker */ + AutoVacRebalance, /* rebalance the cost limits */ + AutoVacNumSignals = AutoVacRebalance /* must be last */ + } AutoVacuumSignal; + /*------------- * The main autovacuum shmem struct. On shared memory we store this main * struct and the array of WorkerInfo structs. This struct keeps: * + * av_signal set by other processes to indicate various conditions * av_launcherpid the PID of the autovacuum launcher * av_freeWorkers the WorkerInfo freelist * av_runningWorkers the WorkerInfo non-free queue * av_startingWorker pointer to WorkerInfo currently being started (cleared by * the worker itself as soon as it's up and running) * ! * This struct is protected by AutovacuumLock, except for av_signal and parts ! * of the worker list (see above). *------------- */ typedef struct { + sig_atomic_t av_signal[AutoVacNumSignals]; pid_t av_launcherpid; SHMEM_OFFSET av_freeWorkers; SHM_QUEUE av_runningWorkers; SHMEM_OFFSET av_startingWorker; } AutoVacuumShmemStruct; static AutoVacuumShmemStruct *AutoVacuumShmem; *************** StartAutoVacLauncher(void) *** 316,344 **** /* * Main loop for the autovacuum launcher process. - * - * The signalling between launcher and worker is as follows: - * - * When the worker has finished starting up, it stores its PID in wi_workerpid - * and sends a SIGUSR1 signal to the launcher. The launcher then knows that - * the postmaster is ready to start a new worker. We do it this way because - * otherwise we risk calling SendPostmasterSignal() when the postmaster hasn't - * yet processed the last one, in which case the second signal would be lost. - * This is only useful when two workers need to be started close to one - * another, which should be rare but it's possible. - * - * When a worker exits, it resets the WorkerInfo struct and puts it back into - * the free list. If there is no free worker slot, it will also signal the - * launcher, which then wakes up and can launch a new worker if it needs to. - * Note that we only need to do it when there's no free worker slot, because - * otherwise there is no need -- the launcher would be awakened normally per - * schedule. - * - * There is a potential problem if, for some reason, a worker starts and is not - * able to bootstrap itself correctly. To prevent this situation from starving - * the whole system, the launcher checks the launch time of the "starting - * worker". If it's too old (older than autovacuum_naptime seconds), it resets - * the worker entry and puts it back into the free list. */ NON_EXEC_STATIC void AutoVacLauncherMain(int argc, char *argv[]) --- 364,369 ---- *************** AutoVacLauncherMain(int argc, char *argv *** 547,552 **** --- 572,581 ---- got_SIGHUP = false; ProcessConfigFile(PGC_SIGHUP); + /* shutdown requested in config file */ + if (!autovacuum_start_daemon) + break; + /* rebalance in case the default cost parameters changed */ LWLockAcquire(AutovacuumLock, LW_EXCLUSIVE); autovac_balance_cost(); *************** AutoVacLauncherMain(int argc, char *argv *** 556,640 **** rebuild_database_list(InvalidOid); } ! /* a worker started up or finished */ if (got_SIGUSR1) { got_SIGUSR1 = false; /* rebalance cost limits, if needed */ ! if (AutoVacuumShmem->av_rebalance) { LWLockAcquire(AutovacuumLock, LW_EXCLUSIVE); ! AutoVacuumShmem->av_rebalance = false; autovac_balance_cost(); LWLockRelease(AutovacuumLock); } - } - - /* - * There are some conditions that we need to check before trying to - * start a launcher. First, we need to make sure that there is a - * launcher slot available. Second, we need to make sure that no other - * worker is still starting up. - */ - - LWLockAcquire(AutovacuumLock, LW_SHARED); - - can_launch = (AutoVacuumShmem->av_freeWorkers != INVALID_OFFSET); - - if (can_launch && AutoVacuumShmem->av_startingWorker != INVALID_OFFSET) - { - WorkerInfo worker = (WorkerInfo) MAKE_PTR(AutoVacuumShmem->av_startingWorker); ! if (current_time == 0) ! current_time = GetCurrentTimestamp(); ! ! /* ! * We can't launch another worker when another one is still ! * starting up, so just sleep for a bit more; that worker will wake ! * us up again as soon as it's ready. We will only wait ! * autovacuum_naptime seconds for this to happen however. Note ! * that failure to connect to a particular database is not a ! * problem here, because the worker removes itself from the ! * startingWorker pointer before trying to connect; only low-level ! * problems, like fork() failure, can get us here. ! */ ! if (TimestampDifferenceExceeds(worker->wi_launchtime, current_time, ! autovacuum_naptime * 1000)) ! { ! LWLockRelease(AutovacuumLock); ! LWLockAcquire(AutovacuumLock, LW_EXCLUSIVE); ! /* ! * No other process can put a worker in starting mode, so if ! * startingWorker is still INVALID after exchanging our lock, ! * we assume it's the same one we saw above (so we don't ! * recheck the launch time). ! */ ! if (AutoVacuumShmem->av_startingWorker != INVALID_OFFSET) ! { ! worker = (WorkerInfo) MAKE_PTR(AutoVacuumShmem->av_startingWorker); ! worker->wi_dboid = InvalidOid; ! worker->wi_tableoid = InvalidOid; ! worker->wi_workerpid = 0; ! worker->wi_launchtime = 0; ! worker->wi_links.next = AutoVacuumShmem->av_freeWorkers; ! AutoVacuumShmem->av_freeWorkers = MAKE_OFFSET(worker); ! AutoVacuumShmem->av_startingWorker = INVALID_OFFSET; ! } ! } ! else { /* ! * maybe the postmaster neglected this start signal -- ! * resend it. Note: the constraints in ! * launcher_determine_sleep keep us from delivering signals too ! * quickly (at most once every 100ms). */ SendPostmasterSignal(PMSIGNAL_START_AUTOVAC_WORKER); ! can_launch = false; } } ! LWLockRelease(AutovacuumLock); /* either shared or exclusive */ if (can_launch) { --- 585,633 ---- rebuild_database_list(InvalidOid); } ! /* a worker finished */ if (got_SIGUSR1) { got_SIGUSR1 = false; /* rebalance cost limits, if needed */ ! if (AutoVacuumShmem->av_signal[AutoVacRebalance]) { LWLockAcquire(AutovacuumLock, LW_EXCLUSIVE); ! AutoVacuumShmem->av_signal[AutoVacRebalance] = false; autovac_balance_cost(); LWLockRelease(AutovacuumLock); } ! if (AutoVacuumShmem->av_signal[AutoVacForkFailed]) { /* ! * If the postmaster failed to start a new worker, we sleep ! * for a little while and resend the signal. The new worker's ! * state is still in memory, so this is sufficient. After ! * that, we restart the main loop. ! * ! * XXX should we put a limit to the number of times we retry? ! * I don't think it makes much sense, because a future start ! * of a worker will continue to fail in the same way. */ + AutoVacuumShmem->av_signal[AutoVacForkFailed] = false; + pg_usleep(100000L); /* 100ms */ SendPostmasterSignal(PMSIGNAL_START_AUTOVAC_WORKER); ! continue; } } ! ! /* ! * We cannot launch a worker if there are not free worker slots, or if ! * there is already a worker starting up. ! */ ! can_launch = true; ! LWLockAcquire(AutovacuumLock, LW_SHARED); ! if (AutoVacuumShmem->av_freeWorkers == INVALID_OFFSET || ! AutoVacuumShmem->av_startingWorker != INVALID_OFFSET) ! can_launch = false; ! LWLockRelease(AutovacuumLock); if (can_launch) { *************** launch_worker(TimestampTz now) *** 1197,1202 **** --- 1190,1206 ---- } } + /* + * Called from postmaster to signal a failure to fork a process to become + * worker. The postmaster should kill(SIGUSR1) the launcher shortly + * after calling this function. + */ + void + AutoVacWorkerFailed(void) + { + AutoVacuumShmem->av_signal[AutoVacForkFailed] = true; + } + /* SIGHUP: set flag to re-read config file at next convenient time */ static void avl_sighup_handler(SIGNAL_ARGS) *************** AutoVacWorkerMain(int argc, char *argv[] *** 1435,1442 **** LWLockAcquire(AutovacuumLock, LW_EXCLUSIVE); /* ! * beware of startingWorker being INVALID; this could happen if the ! * launcher thinks we've taking too long to start. */ if (AutoVacuumShmem->av_startingWorker != INVALID_OFFSET) { --- 1439,1446 ---- LWLockAcquire(AutovacuumLock, LW_EXCLUSIVE); /* ! * beware of startingWorker being INVALID; this should not happen but keep ! * the check for safety */ if (AutoVacuumShmem->av_startingWorker != INVALID_OFFSET) { *************** AutoVacWorkerMain(int argc, char *argv[] *** 1463,1468 **** --- 1467,1473 ---- else { /* no worker entry for me, go away */ + elog(WARNING, "autovacuum worker started without a worker entry"); dbid = InvalidOid; LWLockRelease(AutovacuumLock); } *************** FreeWorkerInfo(int code, Datum arg) *** 1525,1543 **** LWLockAcquire(AutovacuumLock, LW_EXCLUSIVE); /* ! * If this worker shuts down when there is no free worker slot, wake ! * the launcher up so that he can launch a new worker immediately if ! * required. We only save the launcher's PID in local memory here -- ! * the actual signal will be sent when the PGPROC is recycled, because ! * that is when the new worker can actually be launched. * * We somewhat ignore the risk that the launcher changes its PID * between we reading it and the actual kill; we expect ProcKill to be * called shortly after us, and we assume that PIDs are not reused too * quickly after a process exits. */ ! if (AutoVacuumShmem->av_freeWorkers == INVALID_OFFSET) ! AutovacuumLauncherPid = AutoVacuumShmem->av_launcherpid; SHMQueueDelete(&MyWorkerInfo->wi_links); MyWorkerInfo->wi_links.next = AutoVacuumShmem->av_freeWorkers; --- 1530,1547 ---- LWLockAcquire(AutovacuumLock, LW_EXCLUSIVE); /* ! * Wake the launcher up so that he can launch a new worker immediately ! * if required. We only save the launcher's PID in local memory here; ! * the actual signal will be sent when the PGPROC is recycled. Note ! * that we always to this, so that the launcher can rebalance the cost ! * limit setting of the remaining workers. * * We somewhat ignore the risk that the launcher changes its PID * between we reading it and the actual kill; we expect ProcKill to be * called shortly after us, and we assume that PIDs are not reused too * quickly after a process exits. */ ! AutovacuumLauncherPid = AutoVacuumShmem->av_launcherpid; SHMQueueDelete(&MyWorkerInfo->wi_links); MyWorkerInfo->wi_links.next = AutoVacuumShmem->av_freeWorkers; *************** FreeWorkerInfo(int code, Datum arg) *** 1556,1562 **** * now that we're inactive, cause a rebalancing of the surviving * workers */ ! AutoVacuumShmem->av_rebalance = true; LWLockRelease(AutovacuumLock); } } --- 1560,1566 ---- * now that we're inactive, cause a rebalancing of the surviving * workers */ ! AutoVacuumShmem->av_signal[AutoVacRebalance] = true; LWLockRelease(AutovacuumLock); } } Index: src/backend/postmaster/postmaster.c =================================================================== RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/postmaster/postmaster.c,v retrieving revision 1.527 diff -c -p -r1.527 postmaster.c *** src/backend/postmaster/postmaster.c 22 Mar 2007 19:53:30 -0000 1.527 --- src/backend/postmaster/postmaster.c 18 Jun 2007 21:00:56 -0000 *************** StartAutovacuumWorker(void) *** 3830,3861 **** return; bn = (Backend *) malloc(sizeof(Backend)); ! if (!bn) { ! ereport(LOG, ! (errcode(ERRCODE_OUT_OF_MEMORY), ! errmsg("out of memory"))); ! return; ! } ! ! bn->pid = StartAutoVacWorker(); ! bn->is_autovacuum = true; ! /* we don't need a cancel key */ ! if (bn->pid > 0) ! { ! DLAddHead(BackendList, DLNewElem(bn)); #ifdef EXEC_BACKEND ! ShmemBackendArrayAdd(bn); #endif ! } ! else ! { ! /* not much we can do */ ! ereport(LOG, ! (errmsg("could not fork new process for autovacuum: %m"))); free(bn); } } /* --- 3830,3863 ---- return; bn = (Backend *) malloc(sizeof(Backend)); ! if (bn) { ! bn->pid = StartAutoVacWorker(); ! bn->is_autovacuum = true; ! /* we don't need a cancel key */ ! if (bn->pid > 0) ! { ! /* FIXME -- unchecked memory allocation here */ ! DLAddHead(BackendList, DLNewElem(bn)); #ifdef EXEC_BACKEND ! ShmemBackendArrayAdd(bn); #endif ! /* all OK */ ! return; ! } ! ! /* failed, fall through to report */ ! /* XXX the reason of the failure is not reported??? */ free(bn); } + else + elog(LOG, "out of memory"); + + /* report the failure to the launcher */ + AutoVacWorkerFailed(); + if (AutoVacPID != 0) + kill(AutoVacPID, SIGUSR1); } /* Index: src/include/postmaster/autovacuum.h =================================================================== RCS file: /home/alvherre/Code/cvs/pgsql/src/include/postmaster/autovacuum.h,v retrieving revision 1.10 diff -c -p -r1.10 autovacuum.h *** src/include/postmaster/autovacuum.h 18 Apr 2007 16:44:18 -0000 1.10 --- src/include/postmaster/autovacuum.h 18 Jun 2007 18:37:30 -0000 *************** extern bool IsAutoVacuumWorkerProcess(vo *** 42,47 **** --- 42,49 ---- extern void autovac_init(void); extern int StartAutoVacLauncher(void); extern int StartAutoVacWorker(void); + /* called from postmaster when a worker could not be forked */ + extern void AutoVacWorkerFailed(void); /* autovacuum cost-delay balancer */ extern void AutoVacuumUpdateDelay(void);
---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate