I wrote: > One point needing discussion is that the postmaster is currently > coded not to send SIGUSR1 to the archiver if a fast-mode shutdown > is under way. I duplicated that in the added SIGUSR1 signal here, > but I wonder whether it is sane or not. Comments?
After chewing on that for awhile, I decided it was bogus. If we are going to have a policy that the archiver gets a chance to archive everything, that shouldn't depend on fast vs. smart shutdown; those alternatives determine whether we kick clients out ungracefully, not whether we take extra risks with committed data. I think we should allow the archiver to finish out its tasks fully in all non-crash cases except one: if we got SIGTERM from init. In that case there's a very great risk of being SIGKILL'd before we can finish archiving. The postmaster cannot easily tell whether its SIGTERM came from init or not, but we can drive this off the archiver itself getting SIGTERM'd. I propose that if the archiver receives SIGTERM, it should cease to issue any new archive commands, but just wait till it sees the postmaster exit. (It can't exit right away, since there's a race condition: the postmaster might not have been SIGTERM'd yet, and might therefore spawn a new archiver, which would have no idea it's unsafe to do anything more.) There's an obvious failure mode in that, which is that a randomly issued SIGTERM to the archiver would shut down archiving indefinitely. We can guard against that with a timeout: the archiver should exit a minute or two after being SIGTERM'd, even if the postmaster is still there. That should certainly be enough delay to avoid the race condition, and if in fact everything is still hunky-dory the postmaster will immediately spawn a new archiver. Hence, attached revised patch ... regards, tom lane
---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org