Hi,
Alvaro Herrera wrote:
Yeah. For what I need, the launcher just needs to know when a worker
has finished and how many workers there are.
Oh, so it's not all that less communication. My replication manager also
needs to know when a worker dies. You said you are using a signal from
manager to postmaster to request a worker to be forked. How do you do
the other part, where the postmaster needs to tell the launcher which
worker terminated?
For Postgres-R, I'm currently questioning if I shouldn't merge the
replication manager process with the postmaster. Of course, that would
violate the "postmaster does not touch shared memory" constraint.
I suggest you don't. Reliability from Postmaster is very important.
Yes, so? As long as I can't restart the replication manager, but
operation of the whole DBMS relies on it, I have to take the postmaster
dows as soon as it detects a crashed replication manager.
So I still argue that reliability is getting better than status quo, if
I'm merging these two processes (because of less code for communication
between the two).
Of course, the other way to gain reliability would be to make the
replication manager restartable. But restarting the replication manager
means recovering data from other nodes in the cluster, thus a lot of
network traffic. Needless to say, this is quite an expensive operation.
That's why I'm questioning, if that's the behavior we want. Isn't it
better to force the administrators to look into the issue and probably
replace a broken node instead of having one node going amok by
requesting recovery over and over again, possibly forcing crashes of
other nodes, too, because of the additional load for recovery?
But it would make some things a lot easier:
* What if the launcher/manager dies (but you potentially still have
active workers)?
Maybe, for autovacuum you can simply restart the launcher and that
one detects workers from shmem.
With replication, I certainly have to take down the postmaster as
well, as we are certainly out of sync and can't simply restart the
replication manager. So in that case, no postmaster can run without a
replication manager and vice versa. Why not make it one single
process, then?
Well, the point of the postmaster is that it can notice when one process
dies and take appropriate action. When a backend dies, the postmaster
closes all others. But if the postmaster crashes due to a bug in the
manager (due to both being integrated in a single process), how do you
close the backends? There's no one to do it.
That's a point.
But again, as long as the replication manager won't be able to restart,
you gain nothing by closing backends on a crashed node.
In my case, the launcher is not critical. It can die and the postmaster
should just start a new one without much noise. A worker is critical
because it's connected to tables; it's as critical as a regular backend.
So if a worker dies, the postmaster must take everyone down and cause a
restart. This is pretty easy to do.
Yeah, that's the main difference, and I see why your approach makes
perfect sense for the autovacuum case.
In contrast, the replication manager is critical (to one node), and a
restart is expensive (for the whole cluster).
* Startup races: depending on how you start workers, the launcher/
manager may get a "database is starting up" error when requesting
the postmaster to fork backends.
That probably also applies to autovacuum, as those workers shouldn't
work concurrently to a startup process. But maybe there are other
means of ensuring that no autovacuum gets triggered during startup?
Oh, this is very easy as well. In my case the launcher just sets a
database OID to be processed in shared memory, and then calls
SendPostmasterSignal with a particular value. The postmaster must only
check this signal within ServerLoop, which means it won't act on it
(i.e., won't start a worker) until the startup process has finished.
It seems like your launcher is perfectly fine with requesting workers
and not getting them. The replication manager currently isn't. Maybe I
should make it more fault tolerant in that regard...
I guess your problem is that the manager's task is quite a lot more
involved than my launcher's. But in that case, it's even more important
to have them separate.
More involved with what? It does not touch shared memory, it mainly
keeps track of the backends states (by getting a notice from the
postmaster) and does all the necessary forwarding of messages between
the communication system and the backends. It's main loop is similar to
the postmasters, mainly consisting of a select().
I don't understand why the manager talks to postmaster. If it doesn't,
well, then there's no concurrency issue gone, because the remote
backends will be talking to *somebody* anyway; be it postmaster, or
manager.
As with your launcher, I only send one message: the worker request. But
the other way around, from the postmaster to the replication manager,
there are also some messages: a "database is ready" message and a
"worker terminated" messages. Thinking about handling the restarting
cycle, I would need to add a "database is restarting" messages, which
has to be followed by another "database is ready" message.
For sure, the replication manager needs to keep running during a
restarting cycle. And it needs to know the database's state, so as to be
able to decide if it can request workers or not.
(Maybe your problem is that the manager is not correctly designed. We
can talk about checking that code. I happen to know the Postmaster
process handling code because of my previous work with Autovacuum and
because of Mammoth Replicator.)
Thanks for the offer, I'll get back to that.
I think you're underestimating the postmaster's task.
Maybe, but it certainly looses importance within a cluster, since it
controls only part of the whole database system.
Ok. I have one ready, and it works very well. It only ever starts one
worker -- I have constrained that way just to keep the current behavior
of a single autovacuum process running at any time. My plan is to get
it submitted for review, and then start working on having it consider
multiple workers and introduce more scheduling smarts.
Sounds like a good plan.
Thank you for your inputs. You made me rethink some issues and pointed
me to some open questions.
Regards
Markus
---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly