On Thu, Apr 18, 2019 at 6:39 PM Stephen Frost <sfr...@snowman.net> wrote: > Where is the client going to get the threshold LSN from? > > If it doesn't have access to the old backup, then I'm a bit confused as > to how a incremental backup would be possible? Isn't that a requirement > here?
I explained this in the very first email that I wrote on this thread, and then wrote a very extensive further reply on this exact topic to Peter Eisentraut. It's a bit disheartening to see you arguing against my ideas when it's not clear that you've actually read and understood them. > > The obvious way of extending this system to parallel backup is to have > > N connections each streaming a separate tarfile such that when you > > combine them all you recreate the original data directory. That would > > be perfectly compatible with what I'm proposing for incremental > > backup. Maybe you have another idea in mind, but I don't know what it > > is exactly. > > So, while that's an obvious approach, it isn't the most sensible- and > we know that from experience in actually implementing parallel backup of > PG files. I'm happy to discuss the approach we use in pgBackRest if > you'd like to discuss this further, but it seems a bit far afield from > the topic of discussion here and it seems like you're not interested or > offering to work on supporting parallel backup in core. If there's some way of modifying my proposal so that it makes life better for external backup tools, I'm certainly willing to consider that, but you're going to have to tell me what you have in mind. If that means describing what pgbackrest does, then do it. My concern here is that you seem to want a lot of complicated stuff that will require *significant* setup in order for people to be able to use it. From what I am able to gather from your remarks so far, you think people should archive their WAL to a separate machine, and then the WAL-summarizer should run there, and then data from that should be fed back to the backup client, which should then give the server a list of modified files (and presumably, someday, blocks) and the server then returns that data, which the client then cross-verifies with checksums and awesome sauce. Which is all fine, but actually requires quite a bit of set-up and quite a bit of buy-in to the tool. And I have no problem with people having that level of buy-in to the tool. EnterpriseDB offers a number of tools which require similar levels of setup and configuration, and it's not inappropriate for an enterprise-grade backup tool to have all that stuff. However, for those who may not want to do all that, my original proposal lets you take an incremental backup by doing the following list of steps: 1. Take an incremental backup. If you'd like, you can also: 0. Enable the WAL-scanning background worker to make incremental backups much faster. You do not need a WAL archive, and you do not need EITHER the backup tool or the server to have access to previous backups, and you do not need the client to have any access to archived WAL or the summary files produced from it. The only thing you need to know the start-of-backup LSN for the previous backup. I expect you to reply with a long complaint about how my proposal is totally inadequate, but actually I think for most people, most of the time, it would not only be adequate, but extremely convenient. And despite your protestations to the contrary, it does not block parallelism, checksum verification, or any other cool features that somebody may want to add later. It'll work just fine with those things. And for the record, I am willing to put some effort into parallelism. I just think that it makes more sense to do the incremental part first. I think that incremental backup is likely to have less effect on parallel backup than the other way around. What I'm NOT willing to do is build a whole bunch of infrastructure that will help pgbackrest do amazing things but will not provide a simple and convenient way of taking incremental backups using only core tools. I do care about having something that's good for pgbackrest and other out-of-core tools. I just care about it MUCH LESS than I care about making PostgreSQL core awesome. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company