On Tue, Jul 30, 2019 at 1:28 AM Robert Haas <robertmh...@gmail.com> wrote:
> On Wed, Jul 10, 2019 at 2:17 PM Anastasia Lubennikova > <a.lubennik...@postgrespro.ru> wrote: > > In attachments, you can find a prototype of incremental pg_basebackup, > > which consists of 2 features: > > > > 1) To perform incremental backup one should call pg_basebackup with a > > new argument: > > > > pg_basebackup -D 'basedir' --prev-backup-start-lsn 'lsn' > > > > where lsn is a start_lsn of parent backup (can be found in > > "backup_label" file) > > > > It calls BASE_BACKUP replication command with a new argument > > PREV_BACKUP_START_LSN 'lsn'. > > > > For datafiles, only pages with LSN > prev_backup_start_lsn will be > > included in the backup. > > They are saved into 'filename.partial' file, 'filename.blockmap' file > > contains an array of BlockNumbers. > > For example, if we backuped blocks 1,3,5, filename.partial will contain > > 3 blocks, and 'filename.blockmap' will contain array {1,3,5}. > > I think it's better to keep both the information about changed blocks > and the contents of the changed blocks in a single file. The list of > changed blocks is probably quite short, and I don't really want to > double the number of files in the backup if there's no real need. I > suspect it's just overall a bit simpler to keep everything together. > I don't think this is a make-or-break thing, and welcome contrary > arguments, but that's my preference. > I had experience working on a similar product and I agree with Robert to keep the changed block info and the changed block in a single file make more sense. +1 > > > 2) To merge incremental backup into a full backup call > > > > pg_basebackup -D 'basedir' --incremental-pgdata 'incremental_basedir' > > --merge-backups > > > > It will move all files from 'incremental_basedir' to 'basedir' handling > > '.partial' files correctly. > > This, to me, looks like it's much worse than the design that I > proposed originally. It means that: > > 1. You can't take an incremental backup without having the full backup > available at the time you want to take the incremental backup. > > 2. You're always storing a full backup, which means that you need more > disk space, and potentially much more I/O while taking the backup. > You save on transfer bandwidth, but you add a lot of disk reads and > writes, costs which have to be paid even if the backup is never > restored. > > > 1) Whether we collect block maps using simple "read everything page by > > page" approach > > or WAL scanning or any other page tracking algorithm, we must choose a > > map format. > > I implemented the simplest one, while there are more ideas: > > I think we should start simple. > > I haven't had a chance to look at Jeevan's patch at all, or yours in > any detail, as yet, so these are just some very preliminary comments. > It will be good, however, if we can agree on who is going to do what > part of this as we try to drive this forward together. I'm sorry that > I didn't communicate EDB's plans to work on this more clearly; > duplicated effort serves nobody well. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > > > -- Ibrar Ahmed