Re: block-level incremental backup

Ibrar Ahmed Tue, 30 Jul 2019 06:28:01 -0700

On Tue, Jul 30, 2019 at 1:28 AM Robert Haas <robertmh...@gmail.com> wrote:


> On Wed, Jul 10, 2019 at 2:17 PM Anastasia Lubennikova
> <a.lubennik...@postgrespro.ru> wrote:
> > In attachments, you can find a prototype of incremental pg_basebackup,
> > which consists of 2 features:
> >
> > 1) To perform incremental backup one should call pg_basebackup with a
> > new argument:
> >
> > pg_basebackup -D 'basedir' --prev-backup-start-lsn 'lsn'
> >
> > where lsn is a start_lsn of parent backup (can be found in
> > "backup_label" file)
> >
> > It calls BASE_BACKUP replication command with a new argument
> > PREV_BACKUP_START_LSN 'lsn'.
> >
> > For datafiles, only pages with LSN > prev_backup_start_lsn will be
> > included in the backup.
> > They are saved into 'filename.partial' file, 'filename.blockmap' file
> > contains an array of BlockNumbers.
> > For example, if we backuped blocks 1,3,5, filename.partial will contain
> > 3 blocks, and 'filename.blockmap' will contain array {1,3,5}.
>
> I think it's better to keep both the information about changed blocks
> and the contents of the changed blocks in a single file.  The list of
> changed blocks is probably quite short, and I don't really want to
> double the number of files in the backup if there's no real need. I
> suspect it's just overall a bit simpler to keep everything together.
> I don't think this is a make-or-break thing, and welcome contrary
> arguments, but that's my preference.
>

I had experience working on a similar product and I agree with Robert to
keep
the changed block info and the changed block in a single file make more
sense.
+1

>
> > 2) To merge incremental backup into a full backup call
> >
> > pg_basebackup -D 'basedir' --incremental-pgdata 'incremental_basedir'
> > --merge-backups
> >
> > It will move all files from 'incremental_basedir' to 'basedir' handling
> > '.partial' files correctly.
>
> This, to me, looks like it's much worse than the design that I
> proposed originally.  It means that:
>
> 1. You can't take an incremental backup without having the full backup
> available at the time you want to take the incremental backup.
>
> 2. You're always storing a full backup, which means that you need more
> disk space, and potentially much more I/O while taking the backup.
> You save on transfer bandwidth, but you add a lot of disk reads and
> writes, costs which have to be paid even if the backup is never
> restored.
>
> > 1) Whether we collect block maps using simple "read everything page by
> > page" approach
> > or WAL scanning or any other page tracking algorithm, we must choose a
> > map format.
> > I implemented the simplest one, while there are more ideas:
>
> I think we should start simple.
>
> I haven't had a chance to look at Jeevan's patch at all, or yours in
> any detail, as yet, so these are just some very preliminary comments.
> It will be good, however, if we can agree on who is going to do what
> part of this as we try to drive this forward together.  I'm sorry that
> I didn't communicate EDB's plans to work on this more clearly;
> duplicated effort serves nobody well.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
>

-- 
Ibrar Ahmed

Re: block-level incremental backup

Reply via email to