On 11/2/21, 8:07 AM, "Bossart, Nathan" <bossa...@amazon.com> wrote: > The main motivation is provide a way to archive without shelling out. > This reduces the amount of overhead, which can improve archival rate > significantly. It should also make it easier to archive more safely. > For example, many of the common shell commands used for archiving > won't fsync the data, but it isn't too hard to do so via C. The > current proposal doesn't introduce any extra infrastructure for > batching or parallelism, but it is probably still possible. I would > like to eventually add batching, but for now I'm only focused on > introducing basic archive module support.
As noted above, the latest patch set (v11) doesn't add any batching or parallelism. Now that beb4e9b is committed (which causes the archiver to gather multiple files to archive in each scan of archive_status), it seems like a good time to discuss this a bit further. I think there are some interesting design considerations. As is, the current archive module infrastructure in the v11 patch set should help reduce the amount of overhead per-file quite a bit, and I observed a noticeable speedup with a basic file-copying archive strategy (although this is likely not representative of real-world workloads). I believe it would be possible for archive module authors to implement batching/parallelism, but AFAICT it would still require hacks similar to what folks do today with archive_command. For example, you could look ahead in archive_status, archive a bunch of files in a batch or in parallel with background workers, and then quickly return true when the archive_library is called for later files in the batch. Alternatively, we could offer some kind of built-in batching support in the archive module infrastructure. One simple approach would be to just have pgarch_readyXlog() optionally return the entire list of files gathered from the directory scan of archive_status (presently up to 64 files). Or we could provide a GUC like archive_batch_size that would allow users to limit how many files are sent to the archive_library each time. This list would be given to pgarch_archiveXlog(), which would return which files were successfully archived and which failed. I think this could be done for archive_command as well, although it might be tricky to determine which files were archived successfully. To handle that, we might just need to fail the whole batch if the archive_command return value indicates failure. Another interesting change is that the special timeline file handling added in beb4e9b becomes less useful. Presently, if a timeline history file is marked ready for archival, we force pgarch_readyXlog() to do a new scan of archive_status the next time it is called in order to pick it up as soon as possible (ordinarily it just returns the files gathered in a previous scan until it runs out). If we are sending a list of files to the archive module, it will be more difficult to ensure timeline history files are picked up so quickly. Perhaps this is a reasonable tradeoff to make when archive batching is enabled. I think the retry logic can stay roughly the same. If any files in a batch cannot be archived, wait a second before retrying. If that happens a few times in a row, stop archiving for a bit. It wouldn't be quite as precise as what's there today because the failures could be for different files each time, but I don't know if that is terribly important. Finally, I wonder if batching support is something we should bother with at all for the first round of archive module support. I believe it is something that could be easily added later, although it might require archive modules to adjust the archiving callback to accept and return a list of files. IMO the archive modules infrastructure is still an improvement even without batching, and it seems to fit nicely into the existing behavior of the archiver process. I'm curious what others think about all this. Nathan