Re: trying again to get incremental backup

Peter Eisentraut Tue, 24 Oct 2023 07:54:07 -0700

On 04.10.23 22:08, Robert Haas wrote:

- I would like some feedback on the generation of WAL summary files.
Right now, I have it enabled by default, and summaries are kept for a
week. That means that, with no additional setup, you can take an
incremental backup as long as the reference backup was taken in the
last week. File removal is governed by mtimes, so if you change the
mtimes of your summary files or whack your system clock around, weird
things might happen. But obviously this might be inconvenient. Some
people might not want WAL summary files to be generated at all because
they don't care about incremental backup, and other people might want
them retained for longer, and still other people might want them to be
not removed automatically or removed automatically based on some
criteria other than mtime. I don't really know what's best here. I
don't think the default policy that the patches implement is
especially terrible, but it's just something that I made up and I
don't have any real confidence that it's wonderful.

The easiest answer is to have it off by default. Let people figure outwhat works for them. There are various factors like storage, network,server performance, RTO that will determine what combination of fullbackup, incremental backup, and WAL replay will satisfy someone'srequirements. I suppose tests could be set up to determine this to somedegree. But we could also start slow and let people figure it outthemselves. When pg_basebackup was added, it was also disabled by default.

If we think that 7d is a good setting, then I would suggest to consider,like 10d. Otherwise, if you do a weekly incremental backup and you havea time change or a hiccup of some kind one day, you lose your backupsequence.

Another possible answer is, like, 400 days? Because why not? What is areasonable upper limit for this?

- It's regrettable that we don't have incremental JSON parsing; I
think that means anyone who has a backup manifest that is bigger than
1GB can't use this feature. However, that's also a problem for the
existing backup manifest feature, and as far as I can see, we have no
complaints about it. So maybe people just don't have databases with
enough relations for that to be much of a live issue yet. I'm inclined
to treat this as a non-blocker,

It looks like each file entry in the manifest takes about 150 bytes, so1 GB would allow for 1024**3/150 = 7158278 files. That seems fine for now?

- Right now, I have a hard-coded 60 second timeout for WAL
summarization. If you try to take an incremental backup and the WAL
summaries you need don't show up within 60 seconds, the backup times
out. I think that's a reasonable default, but should it be
configurable? If yes, should that be a GUC or, perhaps better, a
pg_basebackup option?

The current user experience of pg_basebackup is that it waits possibly along time for a checkpoint, and there is an option to make it go faster,but there is no timeout AFAICT. Is this substantially different? Couldwe just let it wait forever?

Also, does waiting for checkpoint and WAL summarization happen inparallel? If so, what if it starts a checkpoint that might take 15 minto complete, and then after 60 seconds it kicks you off because the WALsummarization isn't ready. That might be wasteful.

- I'm curious what people think about the pg_walsummary tool that is
included in 0006. I think it's going to be fairly important for
debugging, but it does feel a little bit bad to add a new binary for
something pretty niche.


This seems fine.

Is the WAL summary file format documented anywhere in your patch setyet? My only thought was, maybe the file format could be human-readable(more like backup_label) to avoid this. But maybe not.

Re: trying again to get incremental backup

Reply via email to