Markus Schiltknecht wrote: Hi Markus,
> > Alvaro Herrera wrote: > >1. There will be two kinds of processes, "autovacuum launcher" and > >"autovacuum worker". > > Sounds similar to what I do in Postgres-R: one replication manager and > several "replication workers". Those are called "remote backends" (which > is somewhat of an unfortunate name, IMO.) Oh, yeah, I knew about those and forgot to check them. > >6. Launcher will start a worker using the following protocol: > > - Set up information on what to run on shared memory > > - invoke SendPostmasterSignal(PMSIGNAL_START_AUTOVAC_WORKER) > > - Postmaster will react by starting a worker, and registering it very > > similarly to a regular backend, so it can be shut down easily when > > appropriate. > > (Thus launcher will not be informed right away when worker dies) > > - Worker will examine shared memory to know what to do, clear the > > request, and send a signal to Launcher > > - Launcher wakes up and can start another one if appropriate > > It looks like you need much less communication between the launcher and > the workers, probably also less between the postmaster and the launcher. Yeah. For what I need, the launcher just needs to know when a worker has finished and how many workers there are. > For Postgres-R, I'm currently questioning if I shouldn't merge the > replication manager process with the postmaster. Of course, that would > violate the "postmaster does not touch shared memory" constraint. I suggest you don't. Reliability from Postmaster is very important. > But it would make some things a lot easier: > > * What if the launcher/manager dies (but you potentially still have > active workers)? > > Maybe, for autovacuum you can simply restart the launcher and that > one detects workers from shmem. > > With replication, I certainly have to take down the postmaster as > well, as we are certainly out of sync and can't simply restart the > replication manager. So in that case, no postmaster can run without a > replication manager and vice versa. Why not make it one single > process, then? Well, the point of the postmaster is that it can notice when one process dies and take appropriate action. When a backend dies, the postmaster closes all others. But if the postmaster crashes due to a bug in the manager (due to both being integrated in a single process), how do you close the backends? There's no one to do it. When the logger process dies, postmaster just starts a new one. But when the bgwriter dies, it must cause an restart cycle as well. The postmaster knows what process dies, so it knows how to act. If the manager dies, the postmaster is certainly able to stop all other processes and restart the whole thing. In my case, the launcher is not critical. It can die and the postmaster should just start a new one without much noise. A worker is critical because it's connected to tables; it's as critical as a regular backend. So if a worker dies, the postmaster must take everyone down and cause a restart. This is pretty easy to do. > * Startup races: depending on how you start workers, the launcher/ > manager may get a "database is starting up" error when requesting > the postmaster to fork backends. > That probably also applies to autovacuum, as those workers shouldn't > work concurrently to a startup process. But maybe there are other > means of ensuring that no autovacuum gets triggered during startup? Oh, this is very easy as well. In my case the launcher just sets a database OID to be processed in shared memory, and then calls SendPostmasterSignal with a particular value. The postmaster must only check this signal within ServerLoop, which means it won't act on it (i.e., won't start a worker) until the startup process has finished. The worker is very much like a regular backend. It starts up, and then checks this shared memory. If there's a database OID in there, it removes the OID from shared memory, then connects to the database and does a vacuum cycle. > * Simpler debugging: one process less which could fail, and a whole lot > of concurrency issues (like deadlocks or invalid IPC messages) are > gone. I guess your problem is that the manager's task is quite a lot more involved than my launcher's. But in that case, it's even more important to have them separate. I don't understand why the manager talks to postmaster. If it doesn't, well, then there's no concurrency issue gone, because the remote backends will be talking to *somebody* anyway; be it postmaster, or manager. (Maybe your problem is that the manager is not correctly designed. We can talk about checking that code. I happen to know the Postmaster process handling code because of my previous work with Autovacuum and because of Mammoth Replicator.) > So, why do you want to add a special launcher process? Why can't the > postmaster take care of launching autovacuum workers? It should be > possible to let the postmaster handle *that* part of the shared memory, > as it can simply clean it up. Corruptions wouldn't matter, so I don't > see a problem with that. > > (Probably I'm too much focussed on my case, the replication manager.) I think you're underestimating the postmaster's task. > >Does this raise some red flags? It seems straightforward enough to me; > >I'll submit a patch implementing this, > > Looking forward to that one. Ok. I have one ready, and it works very well. It only ever starts one worker -- I have constrained that way just to keep the current behavior of a single autovacuum process running at any time. My plan is to get it submitted for review, and then start working on having it consider multiple workers and introduce more scheduling smarts. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings