Re: archive modules

Bossart, Nathan Wed, 15 Dec 2021 10:32:24 -0800

On 11/2/21, 8:07 AM, "Bossart, Nathan" <bossa...@amazon.com> wrote:
> The main motivation is provide a way to archive without shelling out.
> This reduces the amount of overhead, which can improve archival rate
> significantly.  It should also make it easier to archive more safely.
> For example, many of the common shell commands used for archiving
> won't fsync the data, but it isn't too hard to do so via C.  The
> current proposal doesn't introduce any extra infrastructure for
> batching or parallelism, but it is probably still possible.  I would
> like to eventually add batching, but for now I'm only focused on
> introducing basic archive module support.


As noted above, the latest patch set (v11) doesn't add any batching or
parallelism.  Now that beb4e9b is committed (which causes the archiver
to gather multiple files to archive in each scan of archive_status),
it seems like a good time to discuss this a bit further.  I think
there are some interesting design considerations.

As is, the current archive module infrastructure in the v11 patch set
should help reduce the amount of overhead per-file quite a bit, and I
observed a noticeable speedup with a basic file-copying archive
strategy (although this is likely not representative of real-world
workloads).  I believe it would be possible for archive module authors
to implement batching/parallelism, but AFAICT it would still require
hacks similar to what folks do today with archive_command.  For
example, you could look ahead in archive_status, archive a bunch of
files in a batch or in parallel with background workers, and then
quickly return true when the archive_library is called for later files
in the batch.

Alternatively, we could offer some kind of built-in batching support
in the archive module infrastructure.  One simple approach would be to
just have pgarch_readyXlog() optionally return the entire list of
files gathered from the directory scan of archive_status (presently up
to 64 files).  Or we could provide a GUC like archive_batch_size that
would allow users to limit how many files are sent to the
archive_library each time.  This list would be given to
pgarch_archiveXlog(), which would return which files were successfully
archived and which failed.  I think this could be done for
archive_command as well, although it might be tricky to determine
which files were archived successfully.  To handle that, we might just
need to fail the whole batch if the archive_command return value
indicates failure.

Another interesting change is that the special timeline file handling
added in beb4e9b becomes less useful.  Presently, if a timeline
history file is marked ready for archival, we force pgarch_readyXlog()
to do a new scan of archive_status the next time it is called in order
to pick it up as soon as possible (ordinarily it just returns the
files gathered in a previous scan until it runs out).  If we are
sending a list of files to the archive module, it will be more
difficult to ensure timeline history files are picked up so quickly.
Perhaps this is a reasonable tradeoff to make when archive batching is
enabled.

I think the retry logic can stay roughly the same.  If any files in a
batch cannot be archived, wait a second before retrying.  If that
happens a few times in a row, stop archiving for a bit.  It wouldn't
be quite as precise as what's there today because the failures could
be for different files each time, but I don't know if that is terribly
important.

Finally, I wonder if batching support is something we should bother
with at all for the first round of archive module support.  I believe
it is something that could be easily added later, although it might
require archive modules to adjust the archiving callback to accept and
return a list of files.  IMO the archive modules infrastructure is
still an improvement even without batching, and it seems to fit nicely
into the existing behavior of the archiver process.  I'm curious what
others think about all this.

Nathan

Re: archive modules

Reply via email to