Currently, the Startup process is responsible for running
restore_command. So when the Startup process is busy or waiting, then
no new WAL files arrive.

That has these effects
* Recovery must wait while the Startup process requests next WAL file.
This reduces performance of archive recovery.
* If replication is file-based then no new files can be downloaded
while we are waiting. If the Startup process waits, it then is much
slower to catch up than it could be if it had already downloaded the
files from the archive.
* We cannot run an archive_cleanup_command, so the archive keep growing.
* Cascading from a standby that uses file based replication is not
easily possible

My solution is to create a new process called the DeArchiver. This
will run restore_command in a tight loop until the number of files
would exceed wal_keep_files, then sleep. Each time the DeArchiver
executes restore_command it will set the return code and if rc=0 the
new XLogRecPtr reached. If standby_mode = on it will continue to retry
indefinitely.

The Startup process will just read files from pg_xlog rather than from
the archive, just as it does for streaming, so this will remove the
special case code in xlog.c. (WALReciver and this process will still
need to coordinate so they are not both simultaneously active at any
point, as now).

This proposal gives a performance gain because the DeArchiver can be
restoring the next file while the Startup process is processing the
current file, so they work together using pipeline parallelism.

The DeArchiver would start when we are not in crash recovery and exit
at the end of recovery. This would then allow restore_command to be
set via reload rather than restart.

Previously, we have given greater weight to files from the archive to
files already in pg_xlog. To ensure that behaviour continues, if
restore_command is set at the Startup process will read the files in
the pg_xlog directory and remember which ones were there at startup.
That way it will be able to tell the difference between files newly
downloaded and those already in the directory. If a file is absent
from the archive we will use the file from pg_xlog.

This makes file-based and stream-based replication work in a similar
way, which is neater, and it also means all required files are
available in case of a crash, which means we can more easily get rid
of shutdown checkpoints in case of failoiver (discussed on separate
thread).

Since more files are available, it allows cascading replication to
have a sender which receives WAL data in files.

Which do we prefer "DeArchiver", "Restore process", or "WALFileReceiver".

Thoughts?

-- 
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to