Alvaro Herrera wrote:
I haven't done that yet, since the current incarnation does not need it.
But I have considered using some signal like SIGUSR1 to mean "something
changed in your processes, look into your shared memory". The
autovacuum shared memory area would contain PIDs (or maybe PGPROC
pointers?) of workers; so when the launcher goes to check that it
notices that one worker is no longer there, meaning that it must have
terminated its job.
Meaning the launcher must keep a list of currently known worker PIDs and
compare that to the list in shared memory. This is doable, but quite a
lot of code for something the postmaster gets for free (i.e. SIGCHLD).
Sure you do -- they won't corrupt anything :-) Plus, what use are
running backends in a multimaster environment, if they can't communicate
with the outside? Much better would be, AFAICS, to shut everyone down
so that the users can connect to a working node.
You are right here. I'll have to recheck my code and make sure I 'take
down' the postmaster in a decent way (i.e. make it terminate it's
children immediately, so that they can't commit anymore).
More involved with what? It does not touch shared memory, it mainly
keeps track of the backends states (by getting a notice from the
postmaster) and does all the necessary forwarding of messages between
the communication system and the backends. It's main loop is similar to
the postmasters, mainly consisting of a select().
I meant "more complicated". And if it has to listen on a socket and
forward messages to remote backends, it certainly is a lot more
complicated than the current autovac launcher.
That may well be. My point was, that my replication manager is so
similar to the postmaster, that it is a real PITA to do that much coding
just to make it a separate process.
For sure, the replication manager needs to keep running during a
restarting cycle. And it needs to know the database's state, so as to be
able to decide if it can request workers or not.
I think this would be pretty easy to do if you made the remote backends
keep state in shared memory. The manager just needs to get a signal to
know that it should check the shared memory. This can be arranged
easily: just have the remote backends signal the postmaster, and have
the postmaster signal the manager. Alternatively, have the manager PID
stored in shared memory and have the remote backends signal (SIGUSR1 or
some such) the manager. (bgwriter does this: it announces its PID in
shared memory, and the backends signal it when they want a CHECKPOINT).
Sounds like we run out of signals, soon. ;-)
I also have to pass around data (writesets), which is why I've come up
with that IMessage stuff. It's a per process message queue in shared
memory, using a SIGUSR1 to signal new messages. Works, but as I said, I
found myself adding messages for all the postmaster events, so that I've
really began to question what to do in which process.
Again, thanks for your inputs.
---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend