Re: [HACKERS] kill -KILL: What happens?
On 27 May 2011 10:01, Florian Pflug wrote: > Anyway, I'm glad to see that Peter Geoghegan has picked this up > any turned this into an actual patch. > > Extremely cool! Thanks Florian. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] kill -KILL: What happens?
On May7, 2011, at 03:50 , Robert Haas wrote: > On Sat, Jan 15, 2011 at 10:44 AM, Florian Pflug wrote: >> I've realized that POSIX actually *does* provide a way to receive a signal - >> the SIGIO machinery. I've modified my test case do to that. To simplify >> things, >> I've removed support for multiple life sign objects. >> >> > Are you planning to develop this into a patch for 9.2? Sorry for the extremely late answer - I received this mail while I was on vacation, and then forgot to answer it once I came back :-( Anyway, I'm glad to see that Peter Geoghegan has picked this up any turned this into an actual patch. Extremely cool! best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] kill -KILL: What happens?
On Sat, Jan 15, 2011 at 10:44 AM, Florian Pflug wrote: > On Jan14, 2011, at 17:45 , Robert Haas wrote: >> On Fri, Jan 14, 2011 at 11:28 AM, Florian Pflug wrote: >>> I gather that the behaviour we want is for normal backends to exit >>> once the postmaster is gone, and for utility processes (bgwriter, ...) >>> to exit once all the backends are gone. >>> >>> The test program I posted in this thread proves that FIFOs and select() >>> can be used to implement this, if we're ready to check for EOF on the >>> socket in CHECK_FOR_INTERRUPTS() every few seconds. Is this a viable >>> route to take? >> >> I don't think there's much point in getting excited about the order in >> which things exit. If we're agreed (and we seem to be, modulo Tom) >> that the backends should exit quickly if the postmaster dies, then >> worrying about whether the utility processes exit slightly before or >> slightly after that doesn't excite me very much. > > I've realized that POSIX actually *does* provide a way to receive a signal - > the SIGIO machinery. I've modified my test case do to that. To simplify > things, > I've removed support for multiple life sign objects. > > The code now does the following: > > The parents creates a pipe, sets it's reading fd to O_NONBLOCK and O_ASYNC, > and registers a SIGIO handler. The SIGIO handler checks a global flag, and > simply sends a SIGTERM to its own pid if the flag is set. > > Child processes close the pipe's writing end (called "giving up ownership > of the life sign" in the code) and set the global flag if they want to receive > a SIGTERM once the parent is gone. The parent's health state can additionally > be checked at any time by trying to read() from the pipe. read() returns > EAGAIN as long as the parent is still alive and EOF otherwise. > > I'm not sure how portable this is. It compiles and runs fine on both my linux > machine (Ubuntu 10.04.01 LTS) and my laptop (OSX 10.6.6). > > In the EXEC_BACKEND case the pipe would need to be created with mkfifo() in > the data directory, but otherwise things should work the same. Haven't tried > that yet, though. > > Code attached. The output should be > > Launched backend 8636 > Launched backend 8637 > Launched backend 8638 > Backend 8636 detected live parent > Backend 8637 detected live parent > Backend 8638 detected live parent > Backend 8636 detected live parent > Backend 8637 detected live parent > Backend 8638 detected live parent > Parent exiting > Backend 8637 exiting after parent died > Backend 8638 exiting after parent died > Backend 8636 exiting after parent died > > if things work correctly. Are you planning to develop this into a patch for 9.2? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] kill -KILL: What happens?
On Jan14, 2011, at 17:45 , Robert Haas wrote: > On Fri, Jan 14, 2011 at 11:28 AM, Florian Pflug wrote: >> I gather that the behaviour we want is for normal backends to exit >> once the postmaster is gone, and for utility processes (bgwriter, ...) >> to exit once all the backends are gone. >> >> The test program I posted in this thread proves that FIFOs and select() >> can be used to implement this, if we're ready to check for EOF on the >> socket in CHECK_FOR_INTERRUPTS() every few seconds. Is this a viable >> route to take? > > I don't think there's much point in getting excited about the order in > which things exit. If we're agreed (and we seem to be, modulo Tom) > that the backends should exit quickly if the postmaster dies, then > worrying about whether the utility processes exit slightly before or > slightly after that doesn't excite me very much. I've realized that POSIX actually *does* provide a way to receive a signal - the SIGIO machinery. I've modified my test case do to that. To simplify things, I've removed support for multiple life sign objects. The code now does the following: The parents creates a pipe, sets it's reading fd to O_NONBLOCK and O_ASYNC, and registers a SIGIO handler. The SIGIO handler checks a global flag, and simply sends a SIGTERM to its own pid if the flag is set. Child processes close the pipe's writing end (called "giving up ownership of the life sign" in the code) and set the global flag if they want to receive a SIGTERM once the parent is gone. The parent's health state can additionally be checked at any time by trying to read() from the pipe. read() returns EAGAIN as long as the parent is still alive and EOF otherwise. I'm not sure how portable this is. It compiles and runs fine on both my linux machine (Ubuntu 10.04.01 LTS) and my laptop (OSX 10.6.6). In the EXEC_BACKEND case the pipe would need to be created with mkfifo() in the data directory, but otherwise things should work the same. Haven't tried that yet, though. Code attached. The output should be Launched backend 8636 Launched backend 8637 Launched backend 8638 Backend 8636 detected live parent Backend 8637 detected live parent Backend 8638 detected live parent Backend 8636 detected live parent Backend 8637 detected live parent Backend 8638 detected live parent Parent exiting Backend 8637 exiting after parent died Backend 8638 exiting after parent died Backend 8636 exiting after parent died if things work correctly. best regards, Florian Pflug liveness.c Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] kill -KILL: What happens?
On Sat, Jan 15, 2011 at 6:27 AM, Florian Pflug wrote: > On Jan14, 2011, at 17:45 , Robert Haas wrote: >> On Fri, Jan 14, 2011 at 11:28 AM, Florian Pflug wrote: >>> I gather that the behaviour we want is for normal backends to exit >>> once the postmaster is gone, and for utility processes (bgwriter, ...) >>> to exit once all the backends are gone. >>> >>> The test program I posted in this thread proves that FIFOs and select() >>> can be used to implement this, if we're ready to check for EOF on the >>> socket in CHECK_FOR_INTERRUPTS() every few seconds. Is this a viable >>> route to take? >> >> I don't think there's much point in getting excited about the order in >> which things exit. If we're agreed (and we seem to be, modulo Tom) >> that the backends should exit quickly if the postmaster dies, then >> worrying about whether the utility processes exit slightly before or >> slightly after that doesn't excite me very much. > > Tom seems to think that as our utility processes gain importance, one day > we might require one to outlive all the backends, and that whatever solution > we adopt should allow us to arrange for that. Or at least this how I > understood him. Well, there's certainly ONE of those already: the logging collector. But it already has its own solution to this problem. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] kill -KILL: What happens?
On Jan14, 2011, at 17:45 , Robert Haas wrote: > On Fri, Jan 14, 2011 at 11:28 AM, Florian Pflug wrote: >> I gather that the behaviour we want is for normal backends to exit >> once the postmaster is gone, and for utility processes (bgwriter, ...) >> to exit once all the backends are gone. >> >> The test program I posted in this thread proves that FIFOs and select() >> can be used to implement this, if we're ready to check for EOF on the >> socket in CHECK_FOR_INTERRUPTS() every few seconds. Is this a viable >> route to take? > > I don't think there's much point in getting excited about the order in > which things exit. If we're agreed (and we seem to be, modulo Tom) > that the backends should exit quickly if the postmaster dies, then > worrying about whether the utility processes exit slightly before or > slightly after that doesn't excite me very much. Tom seems to think that as our utility processes gain importance, one day we might require one to outlive all the backends, and that whatever solution we adopt should allow us to arrange for that. Or at least this how I understood him. That parts can also easily be left out by using only one FIFO instead of two, kept open for writing only in the postmaster. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] kill -KILL: What happens?
On Fri, Jan 14, 2011 at 11:28 AM, Florian Pflug wrote: > I gather that the behaviour we want is for normal backends to exit > once the postmaster is gone, and for utility processes (bgwriter, ...) > to exit once all the backends are gone. > > The test program I posted in this thread proves that FIFOs and select() > can be used to implement this, if we're ready to check for EOF on the > socket in CHECK_FOR_INTERRUPTS() every few seconds. Is this a viable > route to take? I don't think there's much point in getting excited about the order in which things exit. If we're agreed (and we seem to be, modulo Tom) that the backends should exit quickly if the postmaster dies, then worrying about whether the utility processes exit slightly before or slightly after that doesn't excite me very much. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] kill -KILL: What happens?
On Jan14, 2011, at 17:22 , Kevin Grittner wrote: > Alvaro Herrera wrote: > >> If postmaster dies, and then another backend crashes, then your >> backend running "your honking big query" could run across >> corrupted state and then you'd be in serious trouble. > > Worst of all, it could give bogus results without error. I really > don't see a production use case for letting backends continue after > postmaster failure -- unless you only kinda, sorta care whether > committed data is actually retrievable or reported data is actually > accurate. I gather that the behaviour we want is for normal backends to exit once the postmaster is gone, and for utility processes (bgwriter, ...) to exit once all the backends are gone. The test program I posted in this thread proves that FIFOs and select() can be used to implement this, if we're ready to check for EOF on the socket in CHECK_FOR_INTERRUPTS() every few seconds. Is this a viable route to take? best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] kill -KILL: What happens?
Alvaro Herrera wrote: > If postmaster dies, and then another backend crashes, then your > backend running "your honking big query" could run across > corrupted state and then you'd be in serious trouble. Worst of all, it could give bogus results without error. I really don't see a production use case for letting backends continue after postmaster failure -- unless you only kinda, sorta care whether committed data is actually retrievable or reported data is actually accurate. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] kill -KILL: What happens?
Excerpts from Robert Haas's message of vie ene 14 00:03:53 -0300 2011: > On Thu, Jan 13, 2011 at 8:28 PM, Tom Lane wrote: > > True. It strikes me also that the postmaster does provide some services > > other than accepting new connections: > > > > * ensuring that everybody gets killed if a backend crashes > > While you could probably live without these in the scenario of "let my > > honking big query finish before restarting", you would not want to do > > without them in unattended operation. > > Yep. I'm pretty doubtful that you're going to want them even in that > case, but you're surely not going to want them in unattended > operation. I'm sure you don't want that. The reason postmaster causes a restart of all backends in case one of them crashes is that it could have left some corrupted state behind. If postmaster dies, and then another backend crashes, then your backend running "your honking big query" could run across corrupted state and then you'd be in serious trouble. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] kill -KILL: What happens?
On Thu, Jan 13, 2011 at 8:28 PM, Tom Lane wrote: > Robert Haas writes: >> On Thu, Jan 13, 2011 at 8:10 PM, Tom Lane wrote: >>> Florian Pflug writes: So maybe there should be a GUC for this? > >>> No need (and rather inflexible anyway). If you don't want an orphaned >>> backend to continue, you send it SIGTERM. > >> It is not easy to make this work in such a way that you can ensure a >> clean, automatic restart of PostgreSQL after a postmaster death. >> Which is what at least some people want. > > True. It strikes me also that the postmaster does provide some services > other than accepting new connections: > > * ensuring that everybody gets killed if a backend crashes > > * respawning autovac launcher and other processes that might exit > harmlessly > > * is there still any cross-backend signaling that goes through the > postmaster? We got rid of the sinval case, but I don't recall if > there's others. > > While you could probably live without these in the scenario of "let my > honking big query finish before restarting", you would not want to do > without them in unattended operation. Yep. I'm pretty doubtful that you're going to want them even in that case, but you're surely not going to want them in unattended operation. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] kill -KILL: What happens?
Robert Haas writes: > On Thu, Jan 13, 2011 at 8:10 PM, Tom Lane wrote: >> Florian Pflug writes: >>> So maybe there should be a GUC for this? >> No need (and rather inflexible anyway). If you don't want an orphaned >> backend to continue, you send it SIGTERM. > It is not easy to make this work in such a way that you can ensure a > clean, automatic restart of PostgreSQL after a postmaster death. > Which is what at least some people want. True. It strikes me also that the postmaster does provide some services other than accepting new connections: * ensuring that everybody gets killed if a backend crashes * respawning autovac launcher and other processes that might exit harmlessly * is there still any cross-backend signaling that goes through the postmaster? We got rid of the sinval case, but I don't recall if there's others. While you could probably live without these in the scenario of "let my honking big query finish before restarting", you would not want to do without them in unattended operation. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] kill -KILL: What happens?
On Thu, Jan 13, 2011 at 8:10 PM, Tom Lane wrote: > Florian Pflug writes: >> I don't believe there's one right answer to that. > > Right. Force-kill presumes there is only one right answer. > >> Assume postgres is driving a website, and the postmaster crashes shortly >> after a pg_dump run started. You probably won't want your website to be >> offline while pg_dump is finishing its backup. > >> If, on the other hand, your data warehousing database is running a >> multi-hour query, you might prefer that query to finish, even at the price >> of not being able to accept new connections. > >> So maybe there should be a GUC for this? > > No need (and rather inflexible anyway). If you don't want an orphaned > backend to continue, you send it SIGTERM. It is not easy to make this work in such a way that you can ensure a clean, automatic restart of PostgreSQL after a postmaster death. Which is what at least some people want. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] kill -KILL: What happens?
On Thu, Jan 13, 2011 at 7:32 PM, Tom Lane wrote: > Robert Haas writes: >> On Thu, Jan 13, 2011 at 3:37 PM, Tom Lane wrote: >>> Killing active sessions when it's not absolutely necessary is not an >>> asset. > >> That's a highly arguable point and I certainly don't agree with it. > > Your examples appear to rely on the assumption that background processes > exit instantly when the postmaster dies. Which they should not. But they do. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] kill -KILL: What happens?
Florian Pflug writes: > I don't believe there's one right answer to that. Right. Force-kill presumes there is only one right answer. > Assume postgres is driving a website, and the postmaster crashes shortly > after a pg_dump run started. You probably won't want your website to be > offline while pg_dump is finishing its backup. > If, on the other hand, your data warehousing database is running a > multi-hour query, you might prefer that query to finish, even at the price > of not being able to accept new connections. > So maybe there should be a GUC for this? No need (and rather inflexible anyway). If you don't want an orphaned backend to continue, you send it SIGTERM. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] kill -KILL: What happens?
On Jan14, 2011, at 01:32 , Tom Lane wrote: > Robert Haas writes: >> On Thu, Jan 13, 2011 at 3:37 PM, Tom Lane wrote: >>> Killing active sessions when it's not absolutely necessary is not an >>> asset. > >> That's a highly arguable point and I certainly don't agree with it. > > Your examples appear to rely on the assumption that background processes > exit instantly when the postmaster dies. Which they should not. Even if they stay around, no new connections will be possible once the postmaster is gone. So this really comes down to what somebody perceives to be a bigger problem - new connections failing or existing connections being terminated. I don't believe there's one right answer to that. Assume postgres is driving a website, and the postmaster crashes shortly after a pg_dump run started. You probably won't want your website to be offline while pg_dump is finishing its backup. If, on the other hand, your data warehousing database is running a multi-hour query, you might prefer that query to finish, even at the price of not being able to accept new connections. So maybe there should be a GUC for this? best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] kill -KILL: What happens?
Robert Haas writes: > On Thu, Jan 13, 2011 at 3:37 PM, Tom Lane wrote: >> Killing active sessions when it's not absolutely necessary is not an >> asset. > That's a highly arguable point and I certainly don't agree with it. Your examples appear to rely on the assumption that background processes exit instantly when the postmaster dies. Which they should not. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] kill -KILL: What happens?
On Thu, Jan 13, 2011 at 03:29:13PM -0800, Jeff Davis wrote: > On Thu, 2011-01-13 at 11:14 -0800, David Fetter wrote: > > I get that we can't prevent all pilot error, but I was hoping we > > could bullet-proof this a little more, especially in light of a > > certain extremely popular server OS's OOM killer's default > > behavior. > > That's a good point. I'm not sure how much action can reasonably be > taken, however. We may find out from Florian's experiments :) > > Yes, I get that that behavior is crazy, and stupid, and that > > people should shut it off, but it *is* our problem if we let the > > postmaster start (or continue) when it's set that way. > > As an aside, linux has actually changed the heuristic: > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=a63d83f427fbce97a6cea0db2e64b0eb8435cd10 Great! In a decade or so, no more servers will be running with an earlier kernel ;) Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] kill -KILL: What happens?
On Thu, 2011-01-13 at 11:14 -0800, David Fetter wrote: > I get that we can't prevent all pilot error, but I was hoping we could > bullet-proof this a little more, especially in light of a certain > extremely popular server OS's OOM killer's default behavior. That's a good point. I'm not sure how much action can reasonably be taken, however. > Yes, I get that that behavior is crazy, and stupid, and that people > should shut it off, but it *is* our problem if we let the postmaster > start (or continue) when it's set that way. As an aside, linux has actually changed the heuristic: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=a63d83f427fbce97a6cea0db2e64b0eb8435cd10 Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] kill -KILL: What happens?
On Jan13, 2011, at 21:42 , Tom Lane wrote: > Aidan Van Dyk writes: >> If postmaster has a few fds to spare, what about having it open a pipe >> to every child it spawns. It never has to read/write to it, but >> postmaster closing will signal the client's fd. The client just has >> to pop the fd into whatever nrmal poll/select event handlign it uses >> to notice when the "parent's pipe" is closed. > > Hmm. Or more generally: there's one FIFO. The postmaster holds both > sides open. Backends hold the write side open. (They can close the > read side, but that would just be to free up a FD.) Background children > close the write side. Now a background process can use EOF on the read > side of the FIFO to tell it that postmaster and all backends have > exited. You still don't get a signal, but at least the condition you're > testing for is the one we actually want and not an approximation. I was thinking along a similar line, and put together small test case to prove that this actually works. The attached test program simulates the interactions of a parent process (think postmaster), some utility processes (think walwriter, bgwriter, ...) and some backends. It uses two pairs of fd created with pipe(), called LifeSignParent and LifeSignParentBackends. The writing end of the former is held open only in the parent process, while the writing end of the latter is held open in the parent process and all regular backend processes. Backend processes use select() to monitor the reading end of the LifeSignParent fd pair. Since nothing is ever written to the writing end, the fd becomes readable only when the parent exits, because that is how select() signals EOF. Once that happens the backend exits. The utility processes do the same, but monitor the reading end of LifeSignParentBackends, and thus exit only after the parent and all regular backends have died. Since the lifesign checking uses select(), any place that already uses select can easily check for vanishing life signs. CHECK_FOR_INTERRUPTS could simply check the life sign once every few seconds. If we want an absolutely reliable signal instead of checking in CHECK_FOR_INTERRUPTS, every backend would need to launch a monitor subprocess which monitors the life sign, and exits once it vanishes. The backend would then get a SIGCHLD once the postmaster dies. Seems like overkill, though. The whole thing won't work on Windows, since even if it's got a pipe() or socketpair() call, with EXEC_BACKEND there's no way of transferring these fds to the child processes. AFAIK, however, Windows has other means with which such life signs can be implemented. For example, I seem to remember that WaitForMultipleObjects() can be used to wait for process-related events. But windows really isn't my area of expertise... I have tested this on the latest Ubunutu LTS release (10.04.1) as well as Mac OS X 10.6.6, and it seems to work correctly on both systems. I'd be happy to hear from anyone who has access to other systems on whether this works or not. The expected output is Launched utility 5095 Launched backend 5097 Launched utility 5096 Launched backend 5099 Launched backend 5098 Utility 5095 detected live parent or backend Backend 5097 detected live parent Utility 5096 detected live parent or backend Backend 5099 detected live parent Backend 5098 detected live parent Parent exiting Backend 5097 exiting after parent died Backend 5098 exiting after parent died Backend 5099 exiting after parent died Utility 5096 exiting after parent and backends died Utility 5095 exiting after parent and backends died Everything after "Parent exiting" might be interleaved with a shell prompt, of course. best regards, Florian Pflug liveness.c Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] kill -KILL: What happens?
On Thu, Jan 13, 2011 at 02:21:44PM -0500, Tom Lane wrote: > David Fetter writes: > > I get that we can't prevent all pilot error, but I was hoping we > > could bullet-proof this a little more, especially in light of a > > certain extremely popular server OS's OOM killer's default > > behavior. > > > Yes, I get that that behavior is crazy, and stupid, and that > > people should shut it off, but it *is* our problem if we let the > > postmaster start (or continue) when it's set that way. > > Packagers who are paying attention have fixed that ;-) Are we privileging packaged over unpackaged? Some distro over others? ;) Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] kill -KILL: What happens?
On Thu, Jan 13, 2011 at 3:37 PM, Tom Lane wrote: > Robert Haas writes: >> I strongly believe you're in the minority on that one, for the same >> reasons that I don't think most people would agree with your notion of >> what should be the default shutdown mode. A database that can't >> accept new connections is a liability, not an asset. > > Killing active sessions when it's not absolutely necessary is not an > asset. That's a highly arguable point and I certainly don't agree with it. A database with no postmaster and no background processes can't possibly be expected to function in any sort of halfway reasonable way. In particular: 1. No checkpoints will occur, so the time required for recovery will grow longer without bound. 2. All walsenders will exit, so no transactions will be replicated to standbys. 3. Transactions committed asynchronously won't be flushed to disk, and are lost entirely unless enough other WAL activity occurs before the last backend dies to force a WAL write. 4. Autovacuum won't run until the system is properly restarted, and to make matters worse there's no statistics collector, so the information that might trigger a later run will be lost also. 5. At some point, you'll run out of clean buffers, after which performance will start to suck as backends have to do their own writes. 6. At some probably later point, the fsync request queue will fill up, after which performance will go into the toilet. On 9.1devel, this takes less than a minute of moderate activity on my MacOS X machine. All in all, running for any significant period of time in this state is likely a recipe for disaster, even if for some inexplicable reason you don't care about the fact that the system won't accept any new connections. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] kill -KILL: What happens?
On Thu, Jan 13, 2011 at 09:18:06PM +0100, Florian Pflug wrote: > On Jan13, 2011, at 21:01 , Aidan Van Dyk wrote: > > On Thu, Jan 13, 2011 at 2:53 PM, Robert Haas wrote: > >> I'm not convinced. I was thinking that we could simply treat it > >> like SIGQUIT, if it's available. I doubt there's a real use case > >> for continuing to run queries after the postmaster and all the > >> background processes are dead. Expedited death seems like much > >> better behavior. Even checking PostmasterIsAlive() once per > >> query would be reasonable, except that it'd add a system call to > >> check for a condition that almost never holds, which I'm not > >> eager to do. > > > > If postmaster has a few fds to spare, what about having it open a > > pipe to every child it spawns. It never has to read/write to it, > > but postmaster closing will signal the client's fd. The client > > just has to pop the fd into whatever nrmal poll/select event > > handlign it uses to notice when the "parent's pipe" is closed. > > I just started to experiment with that idea, and wrote a small test > program to check if that'd work. I'll post the results when I'm > done. Great! :) Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] kill -KILL: What happens?
Aidan Van Dyk writes: > If postmaster has a few fds to spare, what about having it open a pipe > to every child it spawns. It never has to read/write to it, but > postmaster closing will signal the client's fd. The client just has > to pop the fd into whatever nrmal poll/select event handlign it uses > to notice when the "parent's pipe" is closed. Hmm. Or more generally: there's one FIFO. The postmaster holds both sides open. Backends hold the write side open. (They can close the read side, but that would just be to free up a FD.) Background children close the write side. Now a background process can use EOF on the read side of the FIFO to tell it that postmaster and all backends have exited. You still don't get a signal, but at least the condition you're testing for is the one we actually want and not an approximation. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] kill -KILL: What happens?
On Thu, Jan 13, 2011 at 21:37, Tom Lane wrote: > Robert Haas writes: >> I strongly believe you're in the minority on that one, for the same >> reasons that I don't think most people would agree with your notion of >> what should be the default shutdown mode. A database that can't >> accept new connections is a liability, not an asset. > > Killing active sessions when it's not absolutely necessary is not an > asset. It certainly can be. Consider any connection pooling scenario, which would represent the vast majority of larger deployments today - if you don't kill the sessions, they will never go away. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] kill -KILL: What happens?
Robert Haas writes: > I strongly believe you're in the minority on that one, for the same > reasons that I don't think most people would agree with your notion of > what should be the default shutdown mode. A database that can't > accept new connections is a liability, not an asset. Killing active sessions when it's not absolutely necessary is not an asset. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] kill -KILL: What happens?
Robert Haas wrote: > A database that can't accept new connections is a liability, not > an asset. +1 I have so far been unable to imagine a use case for the production databases I use where I would prefer to see backends continue after postmaster failure. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] kill -KILL: What happens?
On Jan13, 2011, at 21:01 , Aidan Van Dyk wrote: > On Thu, Jan 13, 2011 at 2:53 PM, Robert Haas wrote: >> I'm not convinced. I was thinking that we could simply treat it like >> SIGQUIT, if it's available. I doubt there's a real use case for >> continuing to run queries after the postmaster and all the background >> processes are dead. Expedited death seems like much better behavior. >> Even checking PostmasterIsAlive() once per query would be reasonable, >> except that it'd add a system call to check for a condition that >> almost never holds, which I'm not eager to do. > > If postmaster has a few fds to spare, what about having it open a pipe > to every child it spawns. It never has to read/write to it, but > postmaster closing will signal the client's fd. The client just has > to pop the fd into whatever nrmal poll/select event handlign it uses > to notice when the "parent's pipe" is closed. I just started to experiment with that idea, and wrote a small test program to check if that'd work. I'll post the results when I'm done. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] kill -KILL: What happens?
On Thu, Jan 13, 2011 at 3:01 PM, Tom Lane wrote: > Robert Haas writes: >> On Thu, Jan 13, 2011 at 2:45 PM, Tom Lane wrote: >>> I wonder whether we could have some sort of latch-like counter that >>> would count the number of active backends and deliver signals when the >>> count went to zero. However, if the goal is to defend against random >>> applications of SIGKILL, there's probably no way to make this reliable >>> in userspace. > >> I don't think you can get there 100%. We could, however, make a rule >> that when a background process fails a PostmasterIsAlive() check, it >> sends SIGQUIT to everyone it can find in the ProcArray, which would at >> least ensure a timely exit in most real-world cases. > > You're going in the wrong direction there: we're trying to have the > system remain sane when the postmaster crashes, not see how quickly > it can screw up every remaining session. I strongly believe you're in the minority on that one, for the same reasons that I don't think most people would agree with your notion of what should be the default shutdown mode. A database that can't accept new connections is a liability, not an asset. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] kill -KILL: What happens?
On Thu, Jan 13, 2011 at 2:53 PM, Robert Haas wrote: > I'm not convinced. I was thinking that we could simply treat it like > SIGQUIT, if it's available. I doubt there's a real use case for > continuing to run queries after the postmaster and all the background > processes are dead. Expedited death seems like much better behavior. > Even checking PostmasterIsAlive() once per query would be reasonable, > except that it'd add a system call to check for a condition that > almost never holds, which I'm not eager to do. If postmaster has a few fds to spare, what about having it open a pipe to every child it spawns. It never has to read/write to it, but postmaster closing will signal the client's fd. The client just has to pop the fd into whatever nrmal poll/select event handlign it uses to notice when the "parent's pipe" is closed. A FIFO would allow postmaster to not need as many file handles, and clients reading the fifo would notice when the writer (postmaster) closes it. a. -- Aidan Van Dyk Create like a god, ai...@highrise.ca command like a king, http://www.highrise.ca/ work like a slave. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] kill -KILL: What happens?
Robert Haas writes: > On Thu, Jan 13, 2011 at 2:45 PM, Tom Lane wrote: >> I wonder whether we could have some sort of latch-like counter that >> would count the number of active backends and deliver signals when the >> count went to zero. However, if the goal is to defend against random >> applications of SIGKILL, there's probably no way to make this reliable >> in userspace. > I don't think you can get there 100%. We could, however, make a rule > that when a background process fails a PostmasterIsAlive() check, it > sends SIGQUIT to everyone it can find in the ProcArray, which would at > least ensure a timely exit in most real-world cases. You're going in the wrong direction there: we're trying to have the system remain sane when the postmaster crashes, not see how quickly it can screw up every remaining session. BTW, in Unix-land we could maybe rely on SysV semaphores' SEM_UNDO feature to keep a trustworthy count of how many live processes there are. But I don't know whether there's anything comparable for Windows. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] kill -KILL: What happens?
On Thu, Jan 13, 2011 at 2:45 PM, Tom Lane wrote: > Robert Haas writes: >> On Thu, Jan 13, 2011 at 2:16 PM, Tom Lane wrote: >>> Frankly I'd prefer to get rid of PostmasterIsAlive, not extend its use. >>> It sucks because you don't get a signal on parent death. With the >>> arrival of the latch code, having to check for PostmasterIsAlive >>> frequently is the only reason for an idle background process to consume >>> CPU at all. > >> What we really need is SIGPARENT. I wonder if the Linux folks would >> consider adding such a thing. Might be useful to others as well. > > That's pretty much a dead-end idea unfortunately; it would never be > portable enough to let us change our system structure to rely on it. > Even more to the point, "go away when the postmaster does" isn't > really the behavior we want anyway. "Go away when the last backend > does" is what we want. I'm not convinced. I was thinking that we could simply treat it like SIGQUIT, if it's available. I doubt there's a real use case for continuing to run queries after the postmaster and all the background processes are dead. Expedited death seems like much better behavior. Even checking PostmasterIsAlive() once per query would be reasonable, except that it'd add a system call to check for a condition that almost never holds, which I'm not eager to do. > I wonder whether we could have some sort of latch-like counter that > would count the number of active backends and deliver signals when the > count went to zero. However, if the goal is to defend against random > applications of SIGKILL, there's probably no way to make this reliable > in userspace. I don't think you can get there 100%. We could, however, make a rule that when a background process fails a PostmasterIsAlive() check, it sends SIGQUIT to everyone it can find in the ProcArray, which would at least ensure a timely exit in most real-world cases. > Another idea is to have a "postmaster minder" process that respawns the > postmaster when it's killed. The hard part of that is that the minder > can't be connected to shared memory (else its OOM cross-section is just > as big as the postmaster's), and that makes it difficult for it to tell > when all the children have gone away. I suppose it could be coded to > just retry every few seconds until success. This doesn't improve the > behavior of background processes at all, though. It hardly seems worth it. Given a reliable interlock against multiple postmasters, the real concern is making sure that a half-dead postmaster gets itself all-dead quickly so that the DBA can start up a new one before he gets fired. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] kill -KILL: What happens?
Robert Haas writes: > On Thu, Jan 13, 2011 at 2:16 PM, Tom Lane wrote: >> Frankly I'd prefer to get rid of PostmasterIsAlive, not extend its use. >> It sucks because you don't get a signal on parent death. With the >> arrival of the latch code, having to check for PostmasterIsAlive >> frequently is the only reason for an idle background process to consume >> CPU at all. > What we really need is SIGPARENT. I wonder if the Linux folks would > consider adding such a thing. Might be useful to others as well. That's pretty much a dead-end idea unfortunately; it would never be portable enough to let us change our system structure to rely on it. Even more to the point, "go away when the postmaster does" isn't really the behavior we want anyway. "Go away when the last backend does" is what we want. I wonder whether we could have some sort of latch-like counter that would count the number of active backends and deliver signals when the count went to zero. However, if the goal is to defend against random applications of SIGKILL, there's probably no way to make this reliable in userspace. Another idea is to have a "postmaster minder" process that respawns the postmaster when it's killed. The hard part of that is that the minder can't be connected to shared memory (else its OOM cross-section is just as big as the postmaster's), and that makes it difficult for it to tell when all the children have gone away. I suppose it could be coded to just retry every few seconds until success. This doesn't improve the behavior of background processes at all, though. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] kill -KILL: What happens?
On Thu, Jan 13, 2011 at 2:16 PM, Tom Lane wrote: > Frankly I'd prefer to get rid of PostmasterIsAlive, not extend its use. > It sucks because you don't get a signal on parent death. With the > arrival of the latch code, having to check for PostmasterIsAlive > frequently is the only reason for an idle background process to consume > CPU at all. What we really need is SIGPARENT. I wonder if the Linux folks would consider adding such a thing. Might be useful to others as well. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] kill -KILL: What happens?
David Fetter writes: > I get that we can't prevent all pilot error, but I was hoping we could > bullet-proof this a little more, especially in light of a certain > extremely popular server OS's OOM killer's default behavior. > Yes, I get that that behavior is crazy, and stupid, and that people > should shut it off, but it *is* our problem if we let the postmaster > start (or continue) when it's set that way. Packagers who are paying attention have fixed that ;-) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] kill -KILL: What happens?
Florian Pflug writes: > Couldn't normal backends call PostmasterIsAlive and exit if not, just > like the startup process, the stats collector, autovacuum, bgwriter, > walwriter, walreceiver, walsender and the wal archiver already do? > I assumed they do, but now that I grepped the code it seems they don't. That's intentional: they keep going until the user closes the session or someone sends them a signal to do otherwise. The other various background processes have to watch PostmasterIsAlive because there is no session to close. Frankly I'd prefer to get rid of PostmasterIsAlive, not extend its use. It sucks because you don't get a signal on parent death. With the arrival of the latch code, having to check for PostmasterIsAlive frequently is the only reason for an idle background process to consume CPU at all. Another problem with the scheme is that it only works as long as the background process is providing a *non critical* service. Eventually we are probably going to need some way for bgwriter/walwriter to stay alive long enough to service orphaned backends, rather than disappearing instantly if the postmaster goes away. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] kill -KILL: What happens?
On Thu, Jan 13, 2011 at 12:45:07PM -0600, Kevin Grittner wrote: > Tom Lane wrote: > > > I can't see automating it though. We already have a perfectly > > good solution to the automated shutdown problem. > > Oh, I totally agree with that. I somehow thought we'd gotten off > into how someone could recover after shooting their foot. I get that we can't prevent all pilot error, but I was hoping we could bullet-proof this a little more, especially in light of a certain extremely popular server OS's OOM killer's default behavior. Yes, I get that that behavior is crazy, and stupid, and that people should shut it off, but it *is* our problem if we let the postmaster start (or continue) when it's set that way. Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] kill -KILL: What happens?
On Jan13, 2011, at 19:00 , Tom Lane wrote: > At least on Unix I don't believe there is any other solution. You > could try looking at ps output but there's a fundamental race condition, > ie the postmaster could spawn another child just before you kill it, > whereupon the child is reassigned to init and there's no longer a good > way to tell that it came from that postmaster. Maybe I'm totally confused, but ... Couldn't normal backends call PostmasterIsAlive and exit if not, just like the startup process, the stats collector, autovacuum, bgwriter, walwriter, walreceiver, walsender and the wal archiver already do? I assumed they do, but now that I grepped the code it seems they don't. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] kill -KILL: What happens?
Tom Lane wrote: > I can't see automating it though. We already have a perfectly > good solution to the automated shutdown problem. Oh, I totally agree with that. I somehow thought we'd gotten off into how someone could recover after shooting their foot. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] kill -KILL: What happens?
"Kevin Grittner" writes: > Tom Lane wrote: >> At least on Unix I don't believe there is any other solution. You >> could try looking at ps output but there's a fundamental race >> condition, ie the postmaster could spawn another child just before >> you kill it, whereupon the child is reassigned to init and there's >> no longer a good way to tell that it came from that postmaster. > Couldn't you run `ps auxf` and kill any postgres process which is > not functioning as postmaster (those are pretty easy to distinguish) > and which isn't the child of such a process? Is there ever a reason > to allow such an orphan to run? That's not terribly hard to do by hand, especially since the cautious DBA could also do things like checking a process' CWD to verify which postmaster it had belonged to. I can't see automating it though. We already have a perfectly good solution to the automated shutdown problem. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] kill -KILL: What happens?
Tom Lane wrote: > At least on Unix I don't believe there is any other solution. You > could try looking at ps output but there's a fundamental race > condition, ie the postmaster could spawn another child just before > you kill it, whereupon the child is reassigned to init and there's > no longer a good way to tell that it came from that postmaster. Couldn't you run `ps auxf` and kill any postgres process which is not functioning as postmaster (those are pretty easy to distinguish) and which isn't the child of such a process? Is there ever a reason to allow such an orphan to run? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] kill -KILL: What happens?
David Fetter writes: > On Thu, Jan 13, 2011 at 10:41:28AM -0500, Tom Lane wrote: >> It's just that you're then looking at having to manually clean up the >> child processes and then restart the postmaster; a process that is not >> only tedious but does offer the possibility of screwing yourself. > Does this mean that there's no cross-platform way to ensure that > killing a process results in its children's timely (i.e. before damage > can occur) death? That such a way isn't practical from a performance > point of view? The simple, easy, cross-platform solution is this: don't kill -9 the postmaster. Send it one of the provisioned shutdown signals and let it kill its children for you. At least on Unix I don't believe there is any other solution. You could try looking at ps output but there's a fundamental race condition, ie the postmaster could spawn another child just before you kill it, whereupon the child is reassigned to init and there's no longer a good way to tell that it came from that postmaster. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] kill -KILL: What happens?
On Thu, Jan 13, 2011 at 10:41:28AM -0500, Tom Lane wrote: > David Fetter writes: > > I've noticed over the years that we give people dire warnings never to > > send a KILL signal to the postmaster, but I'm unsure as to what are > > potential consequences of this, as in just exactly how this can result > > in problems. Is there some reference I can look to for explanations > > of the mechanism(s) whereby the damage occurs? > > There's no risk of data corruption, if that's what you're thinking of. > It's just that you're then looking at having to manually clean up the > child processes and then restart the postmaster; a process that is not > only tedious but does offer the possibility of screwing yourself. Does this mean that there's no cross-platform way to ensure that killing a process results in its children's timely (i.e. before damage can occur) death? That such a way isn't practical from a performance point of view? > In particular the risk is that someone clueless enough to do this would > next decide that removing $PGDATA/postmaster.pid, rather than killing > all the existing children, is the quickest way to get the postmaster > restarted. Once he's done that, his data will shortly be hosed beyond > recovery, because now he has two noncommunicating sets of backends > massaging the same files via separate sets of shared buffers. Right. > The reason this sequence of events doesn't seem improbable is that the > error you get when you try to start a new postmaster, if there are still > old backends running, is > > FATAL: pre-existing shared memory block (key 5490001, ID 15609) is still in > use > HINT: If you're sure there are no old server processes still running, remove > the shared memory block or just delete the file "postmaster.pid". > > Maybe we should rewrite that HINT --- while it's *possible* that > removing the shmem block or deleting postmaster.pid is the right thing > to do, it's not exactly *likely*. I think we need to put a bit more > emphasis on the "If ..." part. Like "If you are prepared to swear on > your mother's grave that there are no old server processes still > running, consider removing postmaster.pid. But first check for existing > processes again." Maybe the hint could give an OS-tailored way to check this... > (BTW, I notice that this interlock against starting a new postmaster > appears to be broken in HEAD, which is likely not unrelated to the > fact that the contents of postmaster.pid seem to be totally bollixed > :-() D'oh! Well, I hope knowing it's a problem gives some kind of glimmer as to how to solve it :) Is this worth writing tests for? Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] kill -KILL: What happens?
David Fetter writes: > I've noticed over the years that we give people dire warnings never to > send a KILL signal to the postmaster, but I'm unsure as to what are > potential consequences of this, as in just exactly how this can result > in problems. Is there some reference I can look to for explanations > of the mechanism(s) whereby the damage occurs? There's no risk of data corruption, if that's what you're thinking of. It's just that you're then looking at having to manually clean up the child processes and then restart the postmaster; a process that is not only tedious but does offer the possibility of screwing yourself. In particular the risk is that someone clueless enough to do this would next decide that removing $PGDATA/postmaster.pid, rather than killing all the existing children, is the quickest way to get the postmaster restarted. Once he's done that, his data will shortly be hosed beyond recovery, because now he has two noncommunicating sets of backends massaging the same files via separate sets of shared buffers. The reason this sequence of events doesn't seem improbable is that the error you get when you try to start a new postmaster, if there are still old backends running, is FATAL: pre-existing shared memory block (key 5490001, ID 15609) is still in use HINT: If you're sure there are no old server processes still running, remove the shared memory block or just delete the file "postmaster.pid". Maybe we should rewrite that HINT --- while it's *possible* that removing the shmem block or deleting postmaster.pid is the right thing to do, it's not exactly *likely*. I think we need to put a bit more emphasis on the "If ..." part. Like "If you are prepared to swear on your mother's grave that there are no old server processes still running, consider removing postmaster.pid. But first check for existing processes again." (BTW, I notice that this interlock against starting a new postmaster appears to be broken in HEAD, which is likely not unrelated to the fact that the contents of postmaster.pid seem to be totally bollixed :-() regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] kill -KILL: What happens?
Folks, I've noticed over the years that we give people dire warnings never to send a KILL signal to the postmaster, but I'm unsure as to what are potential consequences of this, as in just exactly how this can result in problems. Is there some reference I can look to for explanations of the mechanism(s) whereby the damage occurs? Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers