Re: [HACKERS] autovacuum process handling

2007-01-27 Thread Markus Schiltknecht

Hi,

Alvaro Herrera wrote:

I haven't done that yet, since the current incarnation does not need it.
But I have considered using some signal like SIGUSR1 to mean something
changed in your processes, look into your shared memory.  The
autovacuum shared memory area would contain PIDs (or maybe PGPROC
pointers?) of workers; so when the launcher goes to check that it
notices that one worker is no longer there, meaning that it must have
terminated its job.


Meaning the launcher must keep a list of currently known worker PIDs and
compare that to the list in shared memory. This is doable, but quite a
lot of code for something the postmaster gets for free (i.e. SIGCHLD).


Sure you do -- they won't corrupt anything :-)  Plus, what use are
running backends in a multimaster environment, if they can't communicate
with the outside?  Much better would be, AFAICS, to shut everyone down
so that the users can connect to a working node.


You are right here. I'll have to recheck my code and make sure I 'take
down' the postmaster in a decent way (i.e. make it terminate it's
children immediately, so that they can't commit anymore).

More involved with what? It does not touch shared memory, it mainly 
keeps track of the backends states (by getting a notice from the 
postmaster) and does all the necessary forwarding of messages between 
the communication system and the backends. It's main loop is similar to 
the postmasters, mainly consisting of a select().


I meant more complicated.  And if it has to listen on a socket and
forward messages to remote backends, it certainly is a lot more
complicated than the current autovac launcher.


That may well be. My point was, that my replication manager is so 
similar to the postmaster, that it is a real PITA to do that much coding 
just to make it a separate process.


For sure, the replication manager needs to keep running during a 
restarting cycle. And it needs to know the database's state, so as to be 
able to decide if it can request workers or not.


I think this would be pretty easy to do if you made the remote backends
keep state in shared memory.  The manager just needs to get a signal to
know that it should check the shared memory.  This can be arranged
easily: just have the remote backends signal the postmaster, and have
the postmaster signal the manager.  Alternatively, have the manager PID
stored in shared memory and have the remote backends signal (SIGUSR1 or
some such) the manager.  (bgwriter does this: it announces its PID in
shared memory, and the backends signal it when they want a CHECKPOINT).


Sounds like we run out of signals, soon. ;-)

I also have to pass around data (writesets), which is why I've come up
with that IMessage stuff. It's a per process message queue in shared
memory, using a SIGUSR1 to signal new messages. Works, but as I said, I
found myself adding messages for all the postmaster events, so that I've
really began to question what to do in which process.

Again, thanks for your inputs.

Markus



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] autovacuum process handling

2007-01-26 Thread Markus Schiltknecht

Hi,

Alvaro Herrera wrote:

Yeah.  For what I need, the launcher just needs to know when a worker
has finished and how many workers there are.


Oh, so it's not all that less communication. My replication manager also 
needs to know when a worker dies. You said you are using a signal from 
manager to postmaster to request a worker to be forked. How do you do 
the other part, where the postmaster needs to tell the launcher which 
worker terminated?


For Postgres-R, I'm currently questioning if I shouldn't merge the 
replication manager process with the postmaster. Of course, that would 
violate the postmaster does not touch shared memory constraint.


I suggest you don't.  Reliability from Postmaster is very important.


Yes, so? As long as I can't restart the replication manager, but 
operation of the whole DBMS relies on it, I have to take the postmaster 
dows as soon as it detects a crashed replication manager.


So I still argue that reliability is getting better than status quo, if 
I'm merging these two processes (because of less code for communication 
between the two).


Of course, the other way to gain reliability would be to make the 
replication manager restartable. But restarting the replication manager 
means recovering data from other nodes in the cluster, thus a lot of 
network traffic. Needless to say, this is quite an expensive operation.


That's why I'm questioning, if that's the behavior we want. Isn't it 
better to force the administrators to look into the issue and probably 
replace a broken node instead of having one node going amok by 
requesting recovery over and over again, possibly forcing crashes of 
other nodes, too, because of the additional load for recovery?



But it would make some things a lot easier:

 * What if the launcher/manager dies (but you potentially still have
   active workers)?

   Maybe, for autovacuum you can simply restart the launcher and that
   one detects workers from shmem.

   With replication, I certainly have to take down the postmaster as
   well, as we are certainly out of sync and can't simply restart the
   replication manager. So in that case, no postmaster can run without a
   replication manager and vice versa. Why not make it one single
   process, then?


Well, the point of the postmaster is that it can notice when one process
dies and take appropriate action.  When a backend dies, the postmaster
closes all others.  But if the postmaster crashes due to a bug in the
manager (due to both being integrated in a single process), how do you
close the backends?  There's no one to do it.


That's a point.

But again, as long as the replication manager won't be able to restart, 
you gain nothing by closing backends on a crashed node.



In my case, the launcher is not critical.  It can die and the postmaster
should just start a new one without much noise.  A worker is critical
because it's connected to tables; it's as critical as a regular backend.
So if a worker dies, the postmaster must take everyone down and cause a
restart.  This is pretty easy to do.


Yeah, that's the main difference, and I see why your approach makes 
perfect sense for the autovacuum case.


In contrast, the replication manager is critical (to one node), and a 
restart is expensive (for the whole cluster).



 * Startup races: depending on how you start workers, the launcher/
   manager may get a database is starting up error when requesting
   the postmaster to fork backends.
   That probably also applies to autovacuum, as those workers shouldn't
   work concurrently to a startup process. But maybe there are other
   means of ensuring that no autovacuum gets triggered during startup?


Oh, this is very easy as well.  In my case the launcher just sets a
database OID to be processed in shared memory, and then calls
SendPostmasterSignal with a particular value.  The postmaster must only
check this signal within ServerLoop, which means it won't act on it
(i.e., won't start a worker) until the startup process has finished.


It seems like your launcher is perfectly fine with requesting workers 
and not getting them. The replication manager currently isn't. Maybe I 
should make it more fault tolerant in that regard...



I guess your problem is that the manager's task is quite a lot more
involved than my launcher's.  But in that case, it's even more important
to have them separate.


More involved with what? It does not touch shared memory, it mainly 
keeps track of the backends states (by getting a notice from the 
postmaster) and does all the necessary forwarding of messages between 
the communication system and the backends. It's main loop is similar to 
the postmasters, mainly consisting of a select().



I don't understand why the manager talks to postmaster.  If it doesn't,
well, then there's no concurrency issue gone, because the remote
backends will be talking to *somebody* anyway; be it postmaster, or
manager.


As with your launcher, I only send one message: the worker 

Re: [HACKERS] autovacuum process handling

2007-01-26 Thread Alvaro Herrera
Markus Schiltknecht wrote:
 Hi,
 
 Alvaro Herrera wrote:
 Yeah.  For what I need, the launcher just needs to know when a worker
 has finished and how many workers there are.
 
 Oh, so it's not all that less communication. My replication manager also 
 needs to know when a worker dies. You said you are using a signal from 
 manager to postmaster to request a worker to be forked. How do you do 
 the other part, where the postmaster needs to tell the launcher which 
 worker terminated?

I haven't done that yet, since the current incarnation does not need it.
But I have considered using some signal like SIGUSR1 to mean something
changed in your processes, look into your shared memory.  The
autovacuum shared memory area would contain PIDs (or maybe PGPROC
pointers?) of workers; so when the launcher goes to check that it
notices that one worker is no longer there, meaning that it must have
terminated its job.

 For Postgres-R, I'm currently questioning if I shouldn't merge the 
 replication manager process with the postmaster. Of course, that would 
 violate the postmaster does not touch shared memory constraint.
 
 I suggest you don't.  Reliability from Postmaster is very important.
 
 Yes, so? As long as I can't restart the replication manager, but 
 operation of the whole DBMS relies on it, I have to take the postmaster 
 dows as soon as it detects a crashed replication manager.

Sure.  But you also need to take down all regular backends, and bgwriter
as well.  If the postmaster just dies, this won't work cleanly.

 That's why I'm questioning, if that's the behavior we want. Isn't it 
 better to force the administrators to look into the issue and probably 
 replace a broken node instead of having one node going amok by 
 requesting recovery over and over again, possibly forcing crashes of 
 other nodes, too, because of the additional load for recovery?

Maybe what you want, then, is that when the replication manager dies,
then the postmaster should close all processes and then shut itself
down.  This also can be arranged easily.

But just crashing the postmaster because the manager sees something
wrong is certainly not a good idea.

 Well, the point of the postmaster is that it can notice when one process
 dies and take appropriate action.  When a backend dies, the postmaster
 closes all others.  But if the postmaster crashes due to a bug in the
 manager (due to both being integrated in a single process), how do you
 close the backends?  There's no one to do it.

 That's a point.
 
 But again, as long as the replication manager won't be able to restart, 
 you gain nothing by closing backends on a crashed node.

Sure you do -- they won't corrupt anything :-)  Plus, what use are
running backends in a multimaster environment, if they can't communicate
with the outside?  Much better would be, AFAICS, to shut everyone down
so that the users can connect to a working node.

 I guess your problem is that the manager's task is quite a lot more
 involved than my launcher's.  But in that case, it's even more important
 to have them separate.
 
 More involved with what? It does not touch shared memory, it mainly 
 keeps track of the backends states (by getting a notice from the 
 postmaster) and does all the necessary forwarding of messages between 
 the communication system and the backends. It's main loop is similar to 
 the postmasters, mainly consisting of a select().

I meant more complicated.  And if it has to listen on a socket and
forward messages to remote backends, it certainly is a lot more
complicated than the current autovac launcher.

 I don't understand why the manager talks to postmaster.  If it doesn't,
 well, then there's no concurrency issue gone, because the remote
 backends will be talking to *somebody* anyway; be it postmaster, or
 manager.
 
 As with your launcher, I only send one message: the worker request. But 
 the other way around, from the postmaster to the replication manager, 
 there are also some messages: a database is ready message and a 
 worker terminated messages. Thinking about handling the restarting 
 cycle, I would need to add a database is restarting messages, which 
 has to be followed by another database is ready message.
 
 For sure, the replication manager needs to keep running during a 
 restarting cycle. And it needs to know the database's state, so as to be 
 able to decide if it can request workers or not.

I think this would be pretty easy to do if you made the remote backends
keep state in shared memory.  The manager just needs to get a signal to
know that it should check the shared memory.  This can be arranged
easily: just have the remote backends signal the postmaster, and have
the postmaster signal the manager.  Alternatively, have the manager PID
stored in shared memory and have the remote backends signal (SIGUSR1 or
some such) the manager.  (bgwriter does this: it announces its PID in
shared memory, and the backends signal it when they want a CHECKPOINT).

 I 

Re: [HACKERS] autovacuum process handling

2007-01-25 Thread Markus Schiltknecht

Hi,

Alvaro Herrera wrote:

1. There will be two kinds of processes, autovacuum launcher and
autovacuum worker.


Sounds similar to what I do in Postgres-R: one replication manager and 
several replication workers. Those are called remote backends (which 
is somewhat of an unfortunate name, IMO.)



6. Launcher will start a worker using the following protocol:
   - Set up information on what to run on shared memory
   - invoke SendPostmasterSignal(PMSIGNAL_START_AUTOVAC_WORKER)
   - Postmaster will react by starting a worker, and registering it very
 similarly to a regular backend, so it can be shut down easily when
 appropriate.
 (Thus launcher will not be informed right away when worker dies)
   - Worker will examine shared memory to know what to do, clear the
 request, and send a signal to Launcher
   - Launcher wakes up and can start another one if appropriate


It looks like you need much less communication between the launcher and 
the workers, probably also less between the postmaster and the launcher.


For Postgres-R, I'm currently questioning if I shouldn't merge the 
replication manager process with the postmaster. Of course, that would 
violate the postmaster does not touch shared memory constraint. But it 
would make some things a lot easier:


 * What if the launcher/manager dies (but you potentially still have
   active workers)?

   Maybe, for autovacuum you can simply restart the launcher and that
   one detects workers from shmem.

   With replication, I certainly have to take down the postmaster as
   well, as we are certainly out of sync and can't simply restart the
   replication manager. So in that case, no postmaster can run without a
   replication manager and vice versa. Why not make it one single
   process, then?

 * Startup races: depending on how you start workers, the launcher/
   manager may get a database is starting up error when requesting
   the postmaster to fork backends.
   That probably also applies to autovacuum, as those workers shouldn't
   work concurrently to a startup process. But maybe there are other
   means of ensuring that no autovacuum gets triggered during startup?

 * Simpler debugging: one process less which could fail, and a whole lot
   of concurrency issues (like deadlocks or invalid IPC messages) are
   gone.

So, why do you want to add a special launcher process? Why can't the 
postmaster take care of launching autovacuum workers? It should be 
possible to let the postmaster handle *that* part of the shared memory, 
as it can simply clean it up. Corruptions wouldn't matter, so I don't 
see a problem with that.


(Probably I'm too much focussed on my case, the replication manager.)


Does this raise some red flags?  It seems straightforward enough to me;
I'll submit a patch implementing this, 


Looking forward to that one.

Regards

Markus

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] autovacuum process handling

2007-01-25 Thread Alvaro Herrera
Markus Schiltknecht wrote:

Hi Markus,

 
 Alvaro Herrera wrote:
 1. There will be two kinds of processes, autovacuum launcher and
 autovacuum worker.
 
 Sounds similar to what I do in Postgres-R: one replication manager and 
 several replication workers. Those are called remote backends (which 
 is somewhat of an unfortunate name, IMO.)

Oh, yeah, I knew about those and forgot to check them.

 6. Launcher will start a worker using the following protocol:
- Set up information on what to run on shared memory
- invoke SendPostmasterSignal(PMSIGNAL_START_AUTOVAC_WORKER)
- Postmaster will react by starting a worker, and registering it very
  similarly to a regular backend, so it can be shut down easily when
  appropriate.
  (Thus launcher will not be informed right away when worker dies)
- Worker will examine shared memory to know what to do, clear the
  request, and send a signal to Launcher
- Launcher wakes up and can start another one if appropriate
 
 It looks like you need much less communication between the launcher and 
 the workers, probably also less between the postmaster and the launcher.

Yeah.  For what I need, the launcher just needs to know when a worker
has finished and how many workers there are.


 For Postgres-R, I'm currently questioning if I shouldn't merge the 
 replication manager process with the postmaster. Of course, that would 
 violate the postmaster does not touch shared memory constraint.

I suggest you don't.  Reliability from Postmaster is very important.

 But it would make some things a lot easier:
 
  * What if the launcher/manager dies (but you potentially still have
active workers)?
 
Maybe, for autovacuum you can simply restart the launcher and that
one detects workers from shmem.
 
With replication, I certainly have to take down the postmaster as
well, as we are certainly out of sync and can't simply restart the
replication manager. So in that case, no postmaster can run without a
replication manager and vice versa. Why not make it one single
process, then?

Well, the point of the postmaster is that it can notice when one process
dies and take appropriate action.  When a backend dies, the postmaster
closes all others.  But if the postmaster crashes due to a bug in the
manager (due to both being integrated in a single process), how do you
close the backends?  There's no one to do it.

When the logger process dies, postmaster just starts a new one.  But
when the bgwriter dies, it must cause an restart cycle as well.  The
postmaster knows what process dies, so it knows how to act.  If the
manager dies, the postmaster is certainly able to stop all other
processes and restart the whole thing.

In my case, the launcher is not critical.  It can die and the postmaster
should just start a new one without much noise.  A worker is critical
because it's connected to tables; it's as critical as a regular backend.
So if a worker dies, the postmaster must take everyone down and cause a
restart.  This is pretty easy to do.

  * Startup races: depending on how you start workers, the launcher/
manager may get a database is starting up error when requesting
the postmaster to fork backends.
That probably also applies to autovacuum, as those workers shouldn't
work concurrently to a startup process. But maybe there are other
means of ensuring that no autovacuum gets triggered during startup?

Oh, this is very easy as well.  In my case the launcher just sets a
database OID to be processed in shared memory, and then calls
SendPostmasterSignal with a particular value.  The postmaster must only
check this signal within ServerLoop, which means it won't act on it
(i.e., won't start a worker) until the startup process has finished.

The worker is very much like a regular backend.  It starts up, and then
checks this shared memory.  If there's a database OID in there, it
removes the OID from shared memory, then connects to the database and
does a vacuum cycle.

  * Simpler debugging: one process less which could fail, and a whole lot
of concurrency issues (like deadlocks or invalid IPC messages) are
gone.

I guess your problem is that the manager's task is quite a lot more
involved than my launcher's.  But in that case, it's even more important
to have them separate.

I don't understand why the manager talks to postmaster.  If it doesn't,
well, then there's no concurrency issue gone, because the remote
backends will be talking to *somebody* anyway; be it postmaster, or
manager.

(Maybe your problem is that the manager is not correctly designed.  We
can talk about checking that code.  I happen to know the Postmaster
process handling code because of my previous work with Autovacuum and
because of Mammoth Replicator.)

 So, why do you want to add a special launcher process? Why can't the 
 postmaster take care of launching autovacuum workers? It should be 
 possible to let the postmaster handle *that* part of 

[HACKERS] autovacuum process handling

2007-01-22 Thread Alvaro Herrera
Hi,

This is how I think autovacuum should change with an eye towards being
able to run multiple vacuums simultaneously:

1. There will be two kinds of processes, autovacuum launcher and
autovacuum worker.

2. The launcher will be in charge of scheduling and will tell workers
what to do

3. The workers will be similar to what autovacuum does today: start when
somebody else tells it to start, process a single item (be it a table or
a database) and terminate

4. Launcher will be a continuously-running process, akin to bgwriter;
connected to shared memory

5. Workers will be direct postmaster children; so postmaster will get
SIGCHLD when worker dies

6. Launcher will start a worker using the following protocol:
   - Set up information on what to run on shared memory
   - invoke SendPostmasterSignal(PMSIGNAL_START_AUTOVAC_WORKER)
   - Postmaster will react by starting a worker, and registering it very
 similarly to a regular backend, so it can be shut down easily when
 appropriate.
 (Thus launcher will not be informed right away when worker dies)
   - Worker will examine shared memory to know what to do, clear the
 request, and send a signal to Launcher
   - Launcher wakes up and can start another one if appropriate

Does this raise some red flags?  It seems straightforward enough to me;
I'll submit a patch implementing this, so that scheduling will continue
to be as it is today.  Thus the scheduling discussions are being
deferred until they can be actually useful and implementable.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] autovacuum process handling

2007-01-22 Thread Matthew T. O'Connor

Alvaro Herrera wrote:

This is how I think autovacuum should change with an eye towards being
able to run multiple vacuums simultaneously:


[snip details]


Does this raise some red flags?  It seems straightforward enough to me;
I'll submit a patch implementing this, so that scheduling will continue
to be as it is today.  Thus the scheduling discussions are being
deferred until they can be actually useful and implementable.


I can't really speak to the PostgreSQL signaling innards, but this sound 
logical to me.  I think having the worker processes be children of the 
postmaster and having them be single-minded (or single-tasked) also 
makes a lot of sense.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] autovacuum process handling

2007-01-22 Thread Jim C. Nasby
On Mon, Jan 22, 2007 at 04:24:28PM -0300, Alvaro Herrera wrote:
 4. Launcher will be a continuously-running process, akin to bgwriter;
 connected to shared memory
 
So would it use up a database connection?

 5. Workers will be direct postmaster children; so postmaster will get
 SIGCHLD when worker dies

As part of this I think we need to make it more obvious how all of this
ties into max_connections. Currently, autovac ties up one of the
super-user connections whenever it's not asleep; these changes would
presumably mean that more of those connections could be tied up.

Rather than forcing users to worry about adjusting max_connections and
superuser_reserved_connections to accommodate autovacuum, the system
should handle it for them.

Were you planning on limiting the number of concurrent vacuum processes
that could be running? If so, we could probably just increase superuser
connections by that amount. If not, we might need to think of something
else...
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] autovacuum process handling

2007-01-22 Thread Alvaro Herrera
Jim C. Nasby wrote:
 On Mon, Jan 22, 2007 at 04:24:28PM -0300, Alvaro Herrera wrote:
  4. Launcher will be a continuously-running process, akin to bgwriter;
  connected to shared memory
  
 So would it use up a database connection?

No.  It's connected to shared memory and has access to pgstats, but it's
not connected to any database so it's not counted.  You'd say it has the
same status as the bgwriter.

  5. Workers will be direct postmaster children; so postmaster will get
  SIGCHLD when worker dies
 
 As part of this I think we need to make it more obvious how all of this
 ties into max_connections. Currently, autovac ties up one of the
 super-user connections whenever it's not asleep; these changes would
 presumably mean that more of those connections could be tied up.

Sure.

 Rather than forcing users to worry about adjusting max_connections and
 superuser_reserved_connections to accommodate autovacuum, the system
 should handle it for them.
 
 Were you planning on limiting the number of concurrent vacuum processes
 that could be running? If so, we could probably just increase superuser
 connections by that amount. If not, we might need to think of something
 else...

The fact that I'm currently narrowly focused on process handling means
that I don't want to touch scheduling at all for now, so I'm gonna make
it so that the launcher decides to launch a worker run only when no
other worker is running.  Thus only a single vacuum thread at any
time.  In the meantime you're welcome to think on the possible solutions
to that problem, which we'll have to attack at some point in the
(hopefully) near future ;-)

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings