Re: [PATCHES] Revised patch for fixing archiver shutdown behavior
On Thu, 2008-01-10 at 14:08 -0500, Tom Lane wrote: I wrote: I'll respin my patch this way... Third time's the charm? If we do a shutdown immediate on the postmaster *after* the bgwriter has written a shutdown checkpoint, do we have any record that there was a panic stop? Do we enter recovery in that case? I think the answers are yes and no, but just checking. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PATCHES] Revised patch for fixing archiver shutdown behavior
Simon Riggs [EMAIL PROTECTED] writes: If we do a shutdown immediate on the postmaster *after* the bgwriter has written a shutdown checkpoint, do we have any record that there was a panic stop? Do we enter recovery in that case? I think the answers are yes and no, but just checking. Yeah, the postmaster would complain at exit, but the pg_control state is already SHUTDOWN at that point. There'd be a log entry showing abnormal archiver exit if the panic had had any actual effect on it. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] Revised patch for fixing archiver shutdown behavior
Tom Lane wrote: Hence, attached revised patch ... Looks good. Something I'm still wondering is about the archiver/logger combination. What happens if a postmaster is stopped by the user and the archiver is still running, and the user starts a new postmaster? This would launch a new archiver and logger; and there are now two loggers possibly writing to the same files, and truncated log lines could occur. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PATCHES] Revised patch for fixing archiver shutdown behavior
Alvaro Herrera [EMAIL PROTECTED] writes: Something I'm still wondering is about the archiver/logger combination. What happens if a postmaster is stopped by the user and the archiver is still running, and the user starts a new postmaster? This would launch a new archiver and logger; and there are now two loggers possibly writing to the same files, and truncated log lines could occur. I'm not nearly as worried about that as I am about the prospect of two concurrent archivers :-( There was discussion of having a lock file for the archiver, but it's still an open issue. I'm not sure how to solve the problem of stale lockfiles --- unlike the postmaster, the archiver can't assume that it's the only live process with the postgres userid. For example, after a system crash-and-restart, it's entirely likely that the PID formerly held by the archiver is now held by the bgwriter, making the lockfile (if any) look live. Maybe we should go back to the plan of having the postmaster wait for the archiver to exit. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PATCHES] Revised patch for fixing archiver shutdown behavior
On Thu, 2008-01-10 at 12:28 -0300, Alvaro Herrera wrote: Yeah, that seems the safest to me -- the problem is that it complicates the shutdown sequence a fair bit, because postmaster must act differently depending on whether archiving is enabled or not: wait for bgwriter exit if disabled, or for archiver exit otherwise. If there's nothing to write, archiver will exit quickly, so in most cases they will look the same. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PATCHES] Revised patch for fixing archiver shutdown behavior
Tom Lane wrote: There was discussion of having a lock file for the archiver, but it's still an open issue. I'm not sure how to solve the problem of stale lockfiles --- unlike the postmaster, the archiver can't assume that it's the only live process with the postgres userid. For example, after a system crash-and-restart, it's entirely likely that the PID formerly held by the archiver is now held by the bgwriter, making the lockfile (if any) look live. Ah, right :-( Maybe we should go back to the plan of having the postmaster wait for the archiver to exit. Yeah, that seems the safest to me -- the problem is that it complicates the shutdown sequence a fair bit, because postmaster must act differently depending on whether archiving is enabled or not: wait for bgwriter exit if disabled, or for archiver exit otherwise. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] Revised patch for fixing archiver shutdown behavior
Alvaro Herrera [EMAIL PROTECTED] writes: Tom Lane wrote: Maybe we should go back to the plan of having the postmaster wait for the archiver to exit. Yeah, that seems the safest to me -- the problem is that it complicates the shutdown sequence a fair bit, because postmaster must act differently depending on whether archiving is enabled or not: wait for bgwriter exit if disabled, or for archiver exit otherwise. Given the recent changes to make the postmaster act as a state machine, I don't think this is really a big deal --- it's just one more state. The bigger part is that the archiver can't wait for postmaster exit. We'll need a proper shutdown signal for the archiver, but since it's not using SIGUSR2 that can be commandeered easily. So it'd be like SIGUSR1 - do an archive cycle SIGUSR2 - do an archive cycle and exit no postmaster - just exit The rationale for the last is that it's a crash situation, and furthermore there's a risk of someone starting a new postmaster and a conflicting archiver. So we should put back the behavior my last patch removed of aborting archiving immediately on postmaster death. I'll respin my patch this way... regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PATCHES] Revised patch for fixing archiver shutdown behavior
On Thu, 2008-01-10 at 10:13 -0500, Tom Lane wrote: Alvaro Herrera [EMAIL PROTECTED] writes: Something I'm still wondering is about the archiver/logger combination. What happens if a postmaster is stopped by the user and the archiver is still running, and the user starts a new postmaster? This would launch a new archiver and logger; and there are now two loggers possibly writing to the same files, and truncated log lines could occur. I'm not nearly as worried about that as I am about the prospect of two concurrent archivers :-( How strange, I was just looking at that particular possibility when your mail arrived. The earlier patch looks good, but I see you've reverted the PostmasterIsAlive() check in pgarch_ArchiverCopyLoop(). That was put in to prevent multiple archivers. Can we keep that? -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PATCHES] Revised patch for fixing archiver shutdown behavior
I wrote: I'll respin my patch this way... Third time's the charm? regards, tom lane binFKkWVCJKov.bin Description: archiver-shutdown-3.patch ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] Revised patch for fixing archiver shutdown behavior
Tom Lane wrote: The attached patch fixes archiver shutdown in what seems to me to be a sane way. With the patch, we send SIGQUIT to the archiver only for panic-stop situations (backend crash or immediate-mode shutdown). This is important because the postmaster is coded to send SIGQUIT to the entire process group, meaning we'd also ungracefully terminate any currently-running archive command, which does not seem like a good idea for normal exit. Instead, the shutdown protocol is that *after* the bgwriter has quit, we send SIGUSR1 (ie, the normal archiver wakeup signal) to the archiver. This ensures that it will do a normal archiving cycle after the last possible creation of WAL entries. Hmm, so the postmaster is gone during the last archiving cycle? What about syslogger? Is the archiver able to log stuff in the last cycle? The comment in line 2180 seems a bit bogus ...? -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PATCHES] Revised patch for fixing archiver shutdown behavior
Alvaro Herrera [EMAIL PROTECTED] writes: Hmm, so the postmaster is gone during the last archiving cycle? What about syslogger? Is the archiver able to log stuff in the last cycle? The logger is no problem --- it quits when it sees EOF on its input pipe, which means that all upstream processes are gone. The comment in line 2180 seems a bit bogus ...? Yeah, that could use a bit more work I guess, since normal children sounds like it would refer to more than just backends. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PATCHES] Revised patch for fixing archiver shutdown behavior
I wrote: One point needing discussion is that the postmaster is currently coded not to send SIGUSR1 to the archiver if a fast-mode shutdown is under way. I duplicated that in the added SIGUSR1 signal here, but I wonder whether it is sane or not. Comments? After chewing on that for awhile, I decided it was bogus. If we are going to have a policy that the archiver gets a chance to archive everything, that shouldn't depend on fast vs. smart shutdown; those alternatives determine whether we kick clients out ungracefully, not whether we take extra risks with committed data. I think we should allow the archiver to finish out its tasks fully in all non-crash cases except one: if we got SIGTERM from init. In that case there's a very great risk of being SIGKILL'd before we can finish archiving. The postmaster cannot easily tell whether its SIGTERM came from init or not, but we can drive this off the archiver itself getting SIGTERM'd. I propose that if the archiver receives SIGTERM, it should cease to issue any new archive commands, but just wait till it sees the postmaster exit. (It can't exit right away, since there's a race condition: the postmaster might not have been SIGTERM'd yet, and might therefore spawn a new archiver, which would have no idea it's unsafe to do anything more.) There's an obvious failure mode in that, which is that a randomly issued SIGTERM to the archiver would shut down archiving indefinitely. We can guard against that with a timeout: the archiver should exit a minute or two after being SIGTERM'd, even if the postmaster is still there. That should certainly be enough delay to avoid the race condition, and if in fact everything is still hunky-dory the postmaster will immediately spawn a new archiver. Hence, attached revised patch ... regards, tom lane binN52EbkMKOk.bin Description: archiver-shutdown-2.patch ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org