Re: [HACKERS] autovacuum process handling
Hi, Alvaro Herrera wrote: I haven't done that yet, since the current incarnation does not need it. But I have considered using some signal like SIGUSR1 to mean something changed in your processes, look into your shared memory. The autovacuum shared memory area would contain PIDs (or maybe PGPROC pointers?) of workers; so when the launcher goes to check that it notices that one worker is no longer there, meaning that it must have terminated its job. Meaning the launcher must keep a list of currently known worker PIDs and compare that to the list in shared memory. This is doable, but quite a lot of code for something the postmaster gets for free (i.e. SIGCHLD). Sure you do -- they won't corrupt anything :-) Plus, what use are running backends in a multimaster environment, if they can't communicate with the outside? Much better would be, AFAICS, to shut everyone down so that the users can connect to a working node. You are right here. I'll have to recheck my code and make sure I 'take down' the postmaster in a decent way (i.e. make it terminate it's children immediately, so that they can't commit anymore). More involved with what? It does not touch shared memory, it mainly keeps track of the backends states (by getting a notice from the postmaster) and does all the necessary forwarding of messages between the communication system and the backends. It's main loop is similar to the postmasters, mainly consisting of a select(). I meant more complicated. And if it has to listen on a socket and forward messages to remote backends, it certainly is a lot more complicated than the current autovac launcher. That may well be. My point was, that my replication manager is so similar to the postmaster, that it is a real PITA to do that much coding just to make it a separate process. For sure, the replication manager needs to keep running during a restarting cycle. And it needs to know the database's state, so as to be able to decide if it can request workers or not. I think this would be pretty easy to do if you made the remote backends keep state in shared memory. The manager just needs to get a signal to know that it should check the shared memory. This can be arranged easily: just have the remote backends signal the postmaster, and have the postmaster signal the manager. Alternatively, have the manager PID stored in shared memory and have the remote backends signal (SIGUSR1 or some such) the manager. (bgwriter does this: it announces its PID in shared memory, and the backends signal it when they want a CHECKPOINT). Sounds like we run out of signals, soon. ;-) I also have to pass around data (writesets), which is why I've come up with that IMessage stuff. It's a per process message queue in shared memory, using a SIGUSR1 to signal new messages. Works, but as I said, I found myself adding messages for all the postmaster events, so that I've really began to question what to do in which process. Again, thanks for your inputs. Markus ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] autovacuum process handling
Hi, Alvaro Herrera wrote: Yeah. For what I need, the launcher just needs to know when a worker has finished and how many workers there are. Oh, so it's not all that less communication. My replication manager also needs to know when a worker dies. You said you are using a signal from manager to postmaster to request a worker to be forked. How do you do the other part, where the postmaster needs to tell the launcher which worker terminated? For Postgres-R, I'm currently questioning if I shouldn't merge the replication manager process with the postmaster. Of course, that would violate the postmaster does not touch shared memory constraint. I suggest you don't. Reliability from Postmaster is very important. Yes, so? As long as I can't restart the replication manager, but operation of the whole DBMS relies on it, I have to take the postmaster dows as soon as it detects a crashed replication manager. So I still argue that reliability is getting better than status quo, if I'm merging these two processes (because of less code for communication between the two). Of course, the other way to gain reliability would be to make the replication manager restartable. But restarting the replication manager means recovering data from other nodes in the cluster, thus a lot of network traffic. Needless to say, this is quite an expensive operation. That's why I'm questioning, if that's the behavior we want. Isn't it better to force the administrators to look into the issue and probably replace a broken node instead of having one node going amok by requesting recovery over and over again, possibly forcing crashes of other nodes, too, because of the additional load for recovery? But it would make some things a lot easier: * What if the launcher/manager dies (but you potentially still have active workers)? Maybe, for autovacuum you can simply restart the launcher and that one detects workers from shmem. With replication, I certainly have to take down the postmaster as well, as we are certainly out of sync and can't simply restart the replication manager. So in that case, no postmaster can run without a replication manager and vice versa. Why not make it one single process, then? Well, the point of the postmaster is that it can notice when one process dies and take appropriate action. When a backend dies, the postmaster closes all others. But if the postmaster crashes due to a bug in the manager (due to both being integrated in a single process), how do you close the backends? There's no one to do it. That's a point. But again, as long as the replication manager won't be able to restart, you gain nothing by closing backends on a crashed node. In my case, the launcher is not critical. It can die and the postmaster should just start a new one without much noise. A worker is critical because it's connected to tables; it's as critical as a regular backend. So if a worker dies, the postmaster must take everyone down and cause a restart. This is pretty easy to do. Yeah, that's the main difference, and I see why your approach makes perfect sense for the autovacuum case. In contrast, the replication manager is critical (to one node), and a restart is expensive (for the whole cluster). * Startup races: depending on how you start workers, the launcher/ manager may get a database is starting up error when requesting the postmaster to fork backends. That probably also applies to autovacuum, as those workers shouldn't work concurrently to a startup process. But maybe there are other means of ensuring that no autovacuum gets triggered during startup? Oh, this is very easy as well. In my case the launcher just sets a database OID to be processed in shared memory, and then calls SendPostmasterSignal with a particular value. The postmaster must only check this signal within ServerLoop, which means it won't act on it (i.e., won't start a worker) until the startup process has finished. It seems like your launcher is perfectly fine with requesting workers and not getting them. The replication manager currently isn't. Maybe I should make it more fault tolerant in that regard... I guess your problem is that the manager's task is quite a lot more involved than my launcher's. But in that case, it's even more important to have them separate. More involved with what? It does not touch shared memory, it mainly keeps track of the backends states (by getting a notice from the postmaster) and does all the necessary forwarding of messages between the communication system and the backends. It's main loop is similar to the postmasters, mainly consisting of a select(). I don't understand why the manager talks to postmaster. If it doesn't, well, then there's no concurrency issue gone, because the remote backends will be talking to *somebody* anyway; be it postmaster, or manager. As with your launcher, I only send one message: the worker
Re: [HACKERS] autovacuum process handling
Markus Schiltknecht wrote: Hi, Alvaro Herrera wrote: Yeah. For what I need, the launcher just needs to know when a worker has finished and how many workers there are. Oh, so it's not all that less communication. My replication manager also needs to know when a worker dies. You said you are using a signal from manager to postmaster to request a worker to be forked. How do you do the other part, where the postmaster needs to tell the launcher which worker terminated? I haven't done that yet, since the current incarnation does not need it. But I have considered using some signal like SIGUSR1 to mean something changed in your processes, look into your shared memory. The autovacuum shared memory area would contain PIDs (or maybe PGPROC pointers?) of workers; so when the launcher goes to check that it notices that one worker is no longer there, meaning that it must have terminated its job. For Postgres-R, I'm currently questioning if I shouldn't merge the replication manager process with the postmaster. Of course, that would violate the postmaster does not touch shared memory constraint. I suggest you don't. Reliability from Postmaster is very important. Yes, so? As long as I can't restart the replication manager, but operation of the whole DBMS relies on it, I have to take the postmaster dows as soon as it detects a crashed replication manager. Sure. But you also need to take down all regular backends, and bgwriter as well. If the postmaster just dies, this won't work cleanly. That's why I'm questioning, if that's the behavior we want. Isn't it better to force the administrators to look into the issue and probably replace a broken node instead of having one node going amok by requesting recovery over and over again, possibly forcing crashes of other nodes, too, because of the additional load for recovery? Maybe what you want, then, is that when the replication manager dies, then the postmaster should close all processes and then shut itself down. This also can be arranged easily. But just crashing the postmaster because the manager sees something wrong is certainly not a good idea. Well, the point of the postmaster is that it can notice when one process dies and take appropriate action. When a backend dies, the postmaster closes all others. But if the postmaster crashes due to a bug in the manager (due to both being integrated in a single process), how do you close the backends? There's no one to do it. That's a point. But again, as long as the replication manager won't be able to restart, you gain nothing by closing backends on a crashed node. Sure you do -- they won't corrupt anything :-) Plus, what use are running backends in a multimaster environment, if they can't communicate with the outside? Much better would be, AFAICS, to shut everyone down so that the users can connect to a working node. I guess your problem is that the manager's task is quite a lot more involved than my launcher's. But in that case, it's even more important to have them separate. More involved with what? It does not touch shared memory, it mainly keeps track of the backends states (by getting a notice from the postmaster) and does all the necessary forwarding of messages between the communication system and the backends. It's main loop is similar to the postmasters, mainly consisting of a select(). I meant more complicated. And if it has to listen on a socket and forward messages to remote backends, it certainly is a lot more complicated than the current autovac launcher. I don't understand why the manager talks to postmaster. If it doesn't, well, then there's no concurrency issue gone, because the remote backends will be talking to *somebody* anyway; be it postmaster, or manager. As with your launcher, I only send one message: the worker request. But the other way around, from the postmaster to the replication manager, there are also some messages: a database is ready message and a worker terminated messages. Thinking about handling the restarting cycle, I would need to add a database is restarting messages, which has to be followed by another database is ready message. For sure, the replication manager needs to keep running during a restarting cycle. And it needs to know the database's state, so as to be able to decide if it can request workers or not. I think this would be pretty easy to do if you made the remote backends keep state in shared memory. The manager just needs to get a signal to know that it should check the shared memory. This can be arranged easily: just have the remote backends signal the postmaster, and have the postmaster signal the manager. Alternatively, have the manager PID stored in shared memory and have the remote backends signal (SIGUSR1 or some such) the manager. (bgwriter does this: it announces its PID in shared memory, and the backends signal it when they want a CHECKPOINT). I
Re: [HACKERS] autovacuum process handling
Hi, Alvaro Herrera wrote: 1. There will be two kinds of processes, autovacuum launcher and autovacuum worker. Sounds similar to what I do in Postgres-R: one replication manager and several replication workers. Those are called remote backends (which is somewhat of an unfortunate name, IMO.) 6. Launcher will start a worker using the following protocol: - Set up information on what to run on shared memory - invoke SendPostmasterSignal(PMSIGNAL_START_AUTOVAC_WORKER) - Postmaster will react by starting a worker, and registering it very similarly to a regular backend, so it can be shut down easily when appropriate. (Thus launcher will not be informed right away when worker dies) - Worker will examine shared memory to know what to do, clear the request, and send a signal to Launcher - Launcher wakes up and can start another one if appropriate It looks like you need much less communication between the launcher and the workers, probably also less between the postmaster and the launcher. For Postgres-R, I'm currently questioning if I shouldn't merge the replication manager process with the postmaster. Of course, that would violate the postmaster does not touch shared memory constraint. But it would make some things a lot easier: * What if the launcher/manager dies (but you potentially still have active workers)? Maybe, for autovacuum you can simply restart the launcher and that one detects workers from shmem. With replication, I certainly have to take down the postmaster as well, as we are certainly out of sync and can't simply restart the replication manager. So in that case, no postmaster can run without a replication manager and vice versa. Why not make it one single process, then? * Startup races: depending on how you start workers, the launcher/ manager may get a database is starting up error when requesting the postmaster to fork backends. That probably also applies to autovacuum, as those workers shouldn't work concurrently to a startup process. But maybe there are other means of ensuring that no autovacuum gets triggered during startup? * Simpler debugging: one process less which could fail, and a whole lot of concurrency issues (like deadlocks or invalid IPC messages) are gone. So, why do you want to add a special launcher process? Why can't the postmaster take care of launching autovacuum workers? It should be possible to let the postmaster handle *that* part of the shared memory, as it can simply clean it up. Corruptions wouldn't matter, so I don't see a problem with that. (Probably I'm too much focussed on my case, the replication manager.) Does this raise some red flags? It seems straightforward enough to me; I'll submit a patch implementing this, Looking forward to that one. Regards Markus ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] autovacuum process handling
Markus Schiltknecht wrote: Hi Markus, Alvaro Herrera wrote: 1. There will be two kinds of processes, autovacuum launcher and autovacuum worker. Sounds similar to what I do in Postgres-R: one replication manager and several replication workers. Those are called remote backends (which is somewhat of an unfortunate name, IMO.) Oh, yeah, I knew about those and forgot to check them. 6. Launcher will start a worker using the following protocol: - Set up information on what to run on shared memory - invoke SendPostmasterSignal(PMSIGNAL_START_AUTOVAC_WORKER) - Postmaster will react by starting a worker, and registering it very similarly to a regular backend, so it can be shut down easily when appropriate. (Thus launcher will not be informed right away when worker dies) - Worker will examine shared memory to know what to do, clear the request, and send a signal to Launcher - Launcher wakes up and can start another one if appropriate It looks like you need much less communication between the launcher and the workers, probably also less between the postmaster and the launcher. Yeah. For what I need, the launcher just needs to know when a worker has finished and how many workers there are. For Postgres-R, I'm currently questioning if I shouldn't merge the replication manager process with the postmaster. Of course, that would violate the postmaster does not touch shared memory constraint. I suggest you don't. Reliability from Postmaster is very important. But it would make some things a lot easier: * What if the launcher/manager dies (but you potentially still have active workers)? Maybe, for autovacuum you can simply restart the launcher and that one detects workers from shmem. With replication, I certainly have to take down the postmaster as well, as we are certainly out of sync and can't simply restart the replication manager. So in that case, no postmaster can run without a replication manager and vice versa. Why not make it one single process, then? Well, the point of the postmaster is that it can notice when one process dies and take appropriate action. When a backend dies, the postmaster closes all others. But if the postmaster crashes due to a bug in the manager (due to both being integrated in a single process), how do you close the backends? There's no one to do it. When the logger process dies, postmaster just starts a new one. But when the bgwriter dies, it must cause an restart cycle as well. The postmaster knows what process dies, so it knows how to act. If the manager dies, the postmaster is certainly able to stop all other processes and restart the whole thing. In my case, the launcher is not critical. It can die and the postmaster should just start a new one without much noise. A worker is critical because it's connected to tables; it's as critical as a regular backend. So if a worker dies, the postmaster must take everyone down and cause a restart. This is pretty easy to do. * Startup races: depending on how you start workers, the launcher/ manager may get a database is starting up error when requesting the postmaster to fork backends. That probably also applies to autovacuum, as those workers shouldn't work concurrently to a startup process. But maybe there are other means of ensuring that no autovacuum gets triggered during startup? Oh, this is very easy as well. In my case the launcher just sets a database OID to be processed in shared memory, and then calls SendPostmasterSignal with a particular value. The postmaster must only check this signal within ServerLoop, which means it won't act on it (i.e., won't start a worker) until the startup process has finished. The worker is very much like a regular backend. It starts up, and then checks this shared memory. If there's a database OID in there, it removes the OID from shared memory, then connects to the database and does a vacuum cycle. * Simpler debugging: one process less which could fail, and a whole lot of concurrency issues (like deadlocks or invalid IPC messages) are gone. I guess your problem is that the manager's task is quite a lot more involved than my launcher's. But in that case, it's even more important to have them separate. I don't understand why the manager talks to postmaster. If it doesn't, well, then there's no concurrency issue gone, because the remote backends will be talking to *somebody* anyway; be it postmaster, or manager. (Maybe your problem is that the manager is not correctly designed. We can talk about checking that code. I happen to know the Postmaster process handling code because of my previous work with Autovacuum and because of Mammoth Replicator.) So, why do you want to add a special launcher process? Why can't the postmaster take care of launching autovacuum workers? It should be possible to let the postmaster handle *that* part of
[HACKERS] autovacuum process handling
Hi, This is how I think autovacuum should change with an eye towards being able to run multiple vacuums simultaneously: 1. There will be two kinds of processes, autovacuum launcher and autovacuum worker. 2. The launcher will be in charge of scheduling and will tell workers what to do 3. The workers will be similar to what autovacuum does today: start when somebody else tells it to start, process a single item (be it a table or a database) and terminate 4. Launcher will be a continuously-running process, akin to bgwriter; connected to shared memory 5. Workers will be direct postmaster children; so postmaster will get SIGCHLD when worker dies 6. Launcher will start a worker using the following protocol: - Set up information on what to run on shared memory - invoke SendPostmasterSignal(PMSIGNAL_START_AUTOVAC_WORKER) - Postmaster will react by starting a worker, and registering it very similarly to a regular backend, so it can be shut down easily when appropriate. (Thus launcher will not be informed right away when worker dies) - Worker will examine shared memory to know what to do, clear the request, and send a signal to Launcher - Launcher wakes up and can start another one if appropriate Does this raise some red flags? It seems straightforward enough to me; I'll submit a patch implementing this, so that scheduling will continue to be as it is today. Thus the scheduling discussions are being deferred until they can be actually useful and implementable. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] autovacuum process handling
Alvaro Herrera wrote: This is how I think autovacuum should change with an eye towards being able to run multiple vacuums simultaneously: [snip details] Does this raise some red flags? It seems straightforward enough to me; I'll submit a patch implementing this, so that scheduling will continue to be as it is today. Thus the scheduling discussions are being deferred until they can be actually useful and implementable. I can't really speak to the PostgreSQL signaling innards, but this sound logical to me. I think having the worker processes be children of the postmaster and having them be single-minded (or single-tasked) also makes a lot of sense. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] autovacuum process handling
On Mon, Jan 22, 2007 at 04:24:28PM -0300, Alvaro Herrera wrote: 4. Launcher will be a continuously-running process, akin to bgwriter; connected to shared memory So would it use up a database connection? 5. Workers will be direct postmaster children; so postmaster will get SIGCHLD when worker dies As part of this I think we need to make it more obvious how all of this ties into max_connections. Currently, autovac ties up one of the super-user connections whenever it's not asleep; these changes would presumably mean that more of those connections could be tied up. Rather than forcing users to worry about adjusting max_connections and superuser_reserved_connections to accommodate autovacuum, the system should handle it for them. Were you planning on limiting the number of concurrent vacuum processes that could be running? If so, we could probably just increase superuser connections by that amount. If not, we might need to think of something else... -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] autovacuum process handling
Jim C. Nasby wrote: On Mon, Jan 22, 2007 at 04:24:28PM -0300, Alvaro Herrera wrote: 4. Launcher will be a continuously-running process, akin to bgwriter; connected to shared memory So would it use up a database connection? No. It's connected to shared memory and has access to pgstats, but it's not connected to any database so it's not counted. You'd say it has the same status as the bgwriter. 5. Workers will be direct postmaster children; so postmaster will get SIGCHLD when worker dies As part of this I think we need to make it more obvious how all of this ties into max_connections. Currently, autovac ties up one of the super-user connections whenever it's not asleep; these changes would presumably mean that more of those connections could be tied up. Sure. Rather than forcing users to worry about adjusting max_connections and superuser_reserved_connections to accommodate autovacuum, the system should handle it for them. Were you planning on limiting the number of concurrent vacuum processes that could be running? If so, we could probably just increase superuser connections by that amount. If not, we might need to think of something else... The fact that I'm currently narrowly focused on process handling means that I don't want to touch scheduling at all for now, so I'm gonna make it so that the launcher decides to launch a worker run only when no other worker is running. Thus only a single vacuum thread at any time. In the meantime you're welcome to think on the possible solutions to that problem, which we'll have to attack at some point in the (hopefully) near future ;-) -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings