On Fri, Mar 10, 2017 at 9:48 AM Stuart Bishop <[email protected]>
wrote:

> Hi.
>
> Is there a way to recover to the latest available target_recovery_time
> or target_recovery_xid, and automatically promote? The best I seem to
> be able to do is watch the logs and manually promote once wal fetching
> starts failing and I've run out of files to replay.
>
> I believe that to do automatic promotion after recovery I need to
> specify a target_recovery_time or target_recovery_xid and I can't see
> any way to determine that. Unless perhaps I download the wal files,
> navigating timeline switches, and analyze them.
>

You can turn standby_mode off, and then the first WAL-E download failure
will cause a promotion.

I am moderately cautious on this: if WAL-E, or any wrapping program exit
with an unexpected status code, the system will leave recovery and start up.

Postgres treats all exit codes greater than 125 as such a case:

[....]
* However, if the failure was due to any sort of signal, it's best to
* punt and abort recovery.  (If we "return false" here, upper levels will
* assume that recovery is complete and start up the database!) It's
* essential to abort on child SIGINT and SIGQUIT, because per spec
* system() ignores SIGINT and SIGQUIT while waiting; if we see one of
* those it's a good bet we should have gotten it too.
*
* On SIGTERM, assume we have received a fast shutdown request, and exit
* cleanly. It's pure chance whether we receive the SIGTERM first, or the
* child process. If we receive it first, the signal handler will call
* proc_exit, otherwise we do it here. If we or the child process received
* SIGTERM for any other reason than a fast shutdown request, postmaster
* will perform an immediate shutdown when it sees us exiting
* unexpectedly.
*
* Per the Single Unix Spec, shells report exit status > 128 when a called
* command died on a signal.  Also, 126 and 127 are used to report
* problems such as an unfindable command; treat those as fatal errors
* too.
*/
if (WIFSIGNALED(rc) && WTERMSIG(rc) == SIGTERM)
proc_exit(1);

signaled = WIFSIGNALED(rc) || WEXITSTATUS(rc) > 125;

I have modestly tried to make WAL-E safe to this purpose, but it has never
quite sat right with me to trust this mechanism in Postgres to promote a
database. Many programs, e.g. wrappers like envdir or whathaveyou, are not
guaranteed to emit status codes > 125 in all non-archive-recovery-failure
cases.

That said, it will work nearly 100% of the time.

-- 
You received this message because you are subscribed to the Google Groups 
"wal-e" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to