On Wed, Nov 13, 2013 at 3:42 AM, Peter Eisentraut <pete...@gmx.net> wrote:
> When an external recovery command such as restore_command or
> archive_cleanup_command fails, it just reports "return code 34567" or
> something, but we have facilities to do decode this properly, so use
> them.

I think this is a very good idea, but you should go a bit further:
document the special relationship restore_command has to special
return codes. Currently, the documentation says:

"It is important that the archive command return zero exit status if
and only if it succeeded. Upon getting a zero result, PostgreSQL will
assume that the WAL segment file has been successfully archived, and
will remove or recycle it. However, a nonzero status tells PostgreSQL
that the file was not archived; it will try again periodically until
it succeeds."

Yes, this concerns archive_command (where return code values that are
non-zero *are* never distinguished), but nothing much is said about
the return code of restore_command specifically anywhere else, so it's
implied that it's exactly inverse to archive_command. In reality, some
special return codes have a significance to restore_command: they make
recovery abort, because they're taking as proxies for various failures
that it isn't sensible to continue recovery in the event of.

We're talking about the difference between recovery aborting, and
recovery having conceptually "reached the end of the WAL stream", so
it's very surprising that this isn't documented currently.

Peter Geoghegan

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to