Re: [HACKERS] autovacuum process handling

Markus Schiltknecht Fri, 26 Jan 2007 02:02:53 -0800

Hi,

Alvaro Herrera wrote:

Yeah.  For what I need, the launcher just needs to know when a worker
has finished and how many workers there are.

Oh, so it's not all that less communication. My replication manager alsoneeds to know when a worker dies. You said you are using a signal frommanager to postmaster to request a worker to be forked. How do you dothe other part, where the postmaster needs to tell the launcher whichworker terminated?

For Postgres-R, I'm currently questioning if I shouldn't merge thereplication manager process with the postmaster. Of course, that wouldviolate the "postmaster does not touch shared memory" constraint.
I suggest you don't.  Reliability from Postmaster is very important.

Yes, so? As long as I can't restart the replication manager, butoperation of the whole DBMS relies on it, I have to take the postmasterdows as soon as it detects a crashed replication manager.

So I still argue that reliability is getting better than status quo, ifI'm merging these two processes (because of less code for communicationbetween the two).

Of course, the other way to gain reliability would be to make thereplication manager restartable. But restarting the replication managermeans recovering data from other nodes in the cluster, thus a lot ofnetwork traffic. Needless to say, this is quite an expensive operation.

That's why I'm questioning, if that's the behavior we want. Isn't itbetter to force the administrators to look into the issue and probablyreplace a broken node instead of having one node going amok byrequesting recovery over and over again, possibly forcing crashes ofother nodes, too, because of the additional load for recovery?

But it would make some things a lot easier:

 * What if the launcher/manager dies (but you potentially still have
   active workers)?

   Maybe, for autovacuum you can simply restart the launcher and that
   one detects workers from shmem.

   With replication, I certainly have to take down the postmaster as
   well, as we are certainly out of sync and can't simply restart the
   replication manager. So in that case, no postmaster can run without a
   replication manager and vice versa. Why not make it one single
   process, then?


Well, the point of the postmaster is that it can notice when one process
dies and take appropriate action.  When a backend dies, the postmaster
closes all others.  But if the postmaster crashes due to a bug in the
manager (due to both being integrated in a single process), how do you
close the backends?  There's no one to do it.


That's a point.

But again, as long as the replication manager won't be able to restart,you gain nothing by closing backends on a crashed node.

In my case, the launcher is not critical.  It can die and the postmaster
should just start a new one without much noise.  A worker is critical
because it's connected to tables; it's as critical as a regular backend.
So if a worker dies, the postmaster must take everyone down and cause a
restart.  This is pretty easy to do.

Yeah, that's the main difference, and I see why your approach makesperfect sense for the autovacuum case.

In contrast, the replication manager is critical (to one node), and arestart is expensive (for the whole cluster).

 * Startup races: depending on how you start workers, the launcher/
   manager may get a "database is starting up" error when requesting
   the postmaster to fork backends.
   That probably also applies to autovacuum, as those workers shouldn't
   work concurrently to a startup process. But maybe there are other
   means of ensuring that no autovacuum gets triggered during startup?


Oh, this is very easy as well.  In my case the launcher just sets a
database OID to be processed in shared memory, and then calls
SendPostmasterSignal with a particular value.  The postmaster must only
check this signal within ServerLoop, which means it won't act on it
(i.e., won't start a worker) until the startup process has finished.

It seems like your launcher is perfectly fine with requesting workersand not getting them. The replication manager currently isn't. Maybe Ishould make it more fault tolerant in that regard...

I guess your problem is that the manager's task is quite a lot more
involved than my launcher's.  But in that case, it's even more important
to have them separate.

More involved with what? It does not touch shared memory, it mainlykeeps track of the backends states (by getting a notice from thepostmaster) and does all the necessary forwarding of messages betweenthe communication system and the backends. It's main loop is similar tothe postmasters, mainly consisting of a select().

I don't understand why the manager talks to postmaster.  If it doesn't,
well, then there's no concurrency issue gone, because the remote
backends will be talking to *somebody* anyway; be it postmaster, or
manager.

As with your launcher, I only send one message: the worker request. Butthe other way around, from the postmaster to the replication manager,there are also some messages: a "database is ready" message and a"worker terminated" messages. Thinking about handling the restartingcycle, I would need to add a "database is restarting" messages, whichhas to be followed by another "database is ready" message.

For sure, the replication manager needs to keep running during arestarting cycle. And it needs to know the database's state, so as to beable to decide if it can request workers or not.

(Maybe your problem is that the manager is not correctly designed.  We
can talk about checking that code.  I happen to know the Postmaster
process handling code because of my previous work with Autovacuum and
because of Mammoth Replicator.)


Thanks for the offer, I'll get back to that.

I think you're underestimating the postmaster's task.

Maybe, but it certainly looses importance within a cluster, since itcontrols only part of the whole database system.

Ok.  I have one ready, and it works very well.  It only ever starts one
worker -- I have constrained that way just to keep the current behavior
of a single autovacuum process running at any time.  My plan is to get
it submitted for review, and then start working on having it consider
multiple workers and introduce more scheduling smarts.


Sounds like a good plan.

Thank you for your inputs. You made me rethink some issues and pointedme to some open questions.


Regards

Markus

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to [EMAIL PROTECTED] so that your
      message can get through to the mailing list cleanly

Re: [HACKERS] autovacuum process handling

Reply via email to