On Fri, Mar 10, 2017 at 9:48 AM Stuart Bishop <[email protected]> wrote:
> Hi. > > Is there a way to recover to the latest available target_recovery_time > or target_recovery_xid, and automatically promote? The best I seem to > be able to do is watch the logs and manually promote once wal fetching > starts failing and I've run out of files to replay. > > I believe that to do automatic promotion after recovery I need to > specify a target_recovery_time or target_recovery_xid and I can't see > any way to determine that. Unless perhaps I download the wal files, > navigating timeline switches, and analyze them. > You can turn standby_mode off, and then the first WAL-E download failure will cause a promotion. I am moderately cautious on this: if WAL-E, or any wrapping program exit with an unexpected status code, the system will leave recovery and start up. Postgres treats all exit codes greater than 125 as such a case: [....] * However, if the failure was due to any sort of signal, it's best to * punt and abort recovery. (If we "return false" here, upper levels will * assume that recovery is complete and start up the database!) It's * essential to abort on child SIGINT and SIGQUIT, because per spec * system() ignores SIGINT and SIGQUIT while waiting; if we see one of * those it's a good bet we should have gotten it too. * * On SIGTERM, assume we have received a fast shutdown request, and exit * cleanly. It's pure chance whether we receive the SIGTERM first, or the * child process. If we receive it first, the signal handler will call * proc_exit, otherwise we do it here. If we or the child process received * SIGTERM for any other reason than a fast shutdown request, postmaster * will perform an immediate shutdown when it sees us exiting * unexpectedly. * * Per the Single Unix Spec, shells report exit status > 128 when a called * command died on a signal. Also, 126 and 127 are used to report * problems such as an unfindable command; treat those as fatal errors * too. */ if (WIFSIGNALED(rc) && WTERMSIG(rc) == SIGTERM) proc_exit(1); signaled = WIFSIGNALED(rc) || WEXITSTATUS(rc) > 125; I have modestly tried to make WAL-E safe to this purpose, but it has never quite sat right with me to trust this mechanism in Postgres to promote a database. Many programs, e.g. wrappers like envdir or whathaveyou, are not guaranteed to emit status codes > 125 in all non-archive-recovery-failure cases. That said, it will work nearly 100% of the time. -- You received this message because you are subscribed to the Google Groups "wal-e" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
