On Tue, Oct 24, 2023 at 10:53 AM Peter Eisentraut <pe...@eisentraut.org> wrote:
> The easiest answer is to have it off by default.  Let people figure out
> what works for them.  There are various factors like storage, network,
> server performance, RTO that will determine what combination of full
> backup, incremental backup, and WAL replay will satisfy someone's
> requirements.  I suppose tests could be set up to determine this to some
> degree.  But we could also start slow and let people figure it out
> themselves.  When pg_basebackup was added, it was also disabled by default.
>
> If we think that 7d is a good setting, then I would suggest to consider,
> like 10d.  Otherwise, if you do a weekly incremental backup and you have
> a time change or a hiccup of some kind one day, you lose your backup
> sequence.
>
> Another possible answer is, like, 400 days?  Because why not?  What is a
> reasonable upper limit for this?

In concept, I don't think this should even be time-based. What you
should do is remove WAL summaries once you know that you've taken as
many incremental backups that might use them as you're ever going to
do. But PostgreSQL itself doesn't have any way of knowing what your
intended backup patterns are. If your incremental backup fails on
Monday night and you run it manually on Tuesday morning, you might
still rerun it as an incremental backup. If it fails every night for a
month and you finally realize and decide to intervene manually, maybe
you want a new full backup at that point. It's been a month. But on
the other hand maybe you don't. There's no time-based answer to this
question that is really correct, and I think it's quite possible that
your backup software might want to shut off time-based deletion
altogether and make its own decisions about when to nuke summaries.
However, I also don't think that's a great default setting. It could
easily lead to people wasting a bunch of disk space for no reason.

As far as the 7d value, I figured that nighty incremental backups
would be fairly common. If we think weekly incremental backups would
be common, then pushing this out to 10d would make sense. While
there's no reason you couldn't take an annual incremental backup, and
thus want a 400d value, it seems like a pretty niche use case.

Note that whether to remove summaries is a separate question from
whether to generate them in the first place. Right now, I have
wal_summarize_mb controlling whether they get generated in the first
place, but as I noted in another recent email, that isn't an entirely
satisfying solution.

> It looks like each file entry in the manifest takes about 150 bytes, so
> 1 GB would allow for 1024**3/150 = 7158278 files.  That seems fine for now?

I suspect a few people have more files than that. They'll just have to
wait to use this feature until we get incremental JSON parsing (or
undo the decision to use JSON for the manifest).

> The current user experience of pg_basebackup is that it waits possibly a
> long time for a checkpoint, and there is an option to make it go faster,
> but there is no timeout AFAICT.  Is this substantially different?  Could
> we just let it wait forever?

We could. I installed the timeout because the first versions of the
feature were buggy, and I didn't like having my tests hang forever
with no indication of what had gone wrong. At least in my experience
so far, the time spent waiting for WAL summarization is typically
quite short, because only the WAL that needs to be summarized is
whatever was emitted since the last time it woke up up through the
start LSN of the backup. That's probably not much, and the next time
the summarizer wakes up, the file should appear within moments. So,
it's a little different from the checkpoint case, where long waits are
expected.

> Also, does waiting for checkpoint and WAL summarization happen in
> parallel?  If so, what if it starts a checkpoint that might take 15 min
> to complete, and then after 60 seconds it kicks you off because the WAL
> summarization isn't ready.  That might be wasteful.

It is not parallel. The trouble is, we don't really have any way to
know whether WAL summarization is going to fail for whatever reason.
We don't expect that to happen, but if somebody changes the
permissions on the WAL summary directory or attaches gdb to the WAL
summarizer process or something of that sort, it might.

We could check at the outset whether we seem to be really far behind
and emit a warning. For instance, if we're 1TB behind on WAL
summarization when the checkpoint is requested, chances are something
is busted and we're probably not going to catch up any time soon. We
could warn the user about that and let them make their own decision
about whether to cancel. But, that idea won't help in unattended
operation, and the threshold for "really far behind" is not very
clear. It might be better to wait until we get more experience with
how things actually fail before doing too much engineering here, but
I'm also open to suggestions.

> Is the WAL summary file format documented anywhere in your patch set
> yet?  My only thought was, maybe the file format could be human-readable
> (more like backup_label) to avoid this.  But maybe not.

The comment in blkreftable.c just above "#define BLOCKS_PER_CHUNK"
gives an overview of the format. I think that we probably don't want
to convert to a text format, because this format is extremely
space-efficient and very convenient to transfer between disk and
memory. We don't want to run out of memory when summarizing large
ranges of WAL, or when taking an incremental backup that requires
combining many individual summaries into a combined summary that tells
us what needs to be included in the backup.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


Reply via email to