Greetings, * Michael Paquier (mich...@paquier.xyz) wrote: > On Sun, Feb 19, 2023 at 08:06:24PM +0530, Robert Haas wrote: > > I mean, my idea was to basically just have one big callback: > > ArchiverModuleMainLoopCB(). Which wouldn't return, or perhaps, would > > only return when archiving was totally caught up and there was nothing > > more to do right now. And then that callback could call functions like > > AreThereAnyMoreFilesIShouldBeArchivingAndIfYesWhatIsTheNextOne(). So > > it would call that function and it would find out about a file and > > start an HTTP session or whatever and then call that function again > > and start another HTTP session for the second file and so on until it > > had as much concurrency as it wanted. And then when it hit the > > concurrency limit, it would wait until at least one HTTP request > > finished. At that point it would call > > HeyEverybodyISuccessfullyArchivedAWalFile(), after which it could > > again ask for the next file and start a request for that one and so on > > and so forth. > > This archiving implementation is not completely impossible with the > current API infrastructure, either? If you consider the archiving as > a two-step process where segments are first copied into a cheap, > reliable area. Then these could be pushed in block in a more remote > area like a S3 bucket? Of course this depends on other things like > the cluster structure, but redundancy can be added with standby > archiving, as well.
Surely it can't be too cheap as it needs to be reliable.. We have looked at this before (copying to a queue area before copying with a separate process off-system) and it simply isn't great and requires more work than you really want to do if you can help it and for no real benefit. > I am not sure exactly how many requirements we want to push into a > callback, to be honest, and surely more requirements pushed to the > callback increases the odds of implementation mistakes, like a full > loop. There already many ways to get it wrong with archiving, like > missing a flush of the archived segment before the callback returns to > ensure its durability.. Without any actual user of any of this it's surprising to me how much effort has been put into it. Have I missed the part where someone has said they're actually implementing an archive library that we can look at and see how it works and how the archive library and the core system could work better together..? We (pgbackrest) are generally interested in the idea to reduce the startup time, but that's not really a big issue for us currently and so it hasn't really risen up to the level of being something we're working on, not to mention that if it keeps changing each release then it's just going to end up being more work for us for a feature that doesn't gain us all that much. Now, all that said, at least in initial discussions, we expect the pgbackrest archive_library to look very similar to how we handle archive_command and async archiving today- when called if there's multiple WAL files to process then we fork an async process off and it goes and spawns multiple processes and does its work to move the WAL files to the off-system storage and when we are called via archive_command we just check a status flag to see if that WAL has been archived yet by the async process or not. If not and there's no async process running then we'll start a new one (starting a new async process periodically actually makes things a lot easier to test for us too, which is why we don't just have an async process running around forever- the startup time typically isn't that big of a deal), if there is a status flag then we return whatever it says, and if the async process is running and no status flag yet then we wait. Once we have that going then perhaps there could be some interesting iteration between pgbackrest and the core code to improve things, but all this discussion and churn feels more likely to put folks off of trying to implement something using this approach than the opposite, unless someone in this discussion is actually working on an archive library, but that isn't the impression I've gotten, at least (though if there is such a work in progress out there, I'd love to see it!). Thanks, Stephen
signature.asc
Description: PGP signature