On 4/11/24 20:51, Tomas Vondra wrote:
On 4/11/24 02:01, David Steele wrote:

I have a hard time seeing this feature as being very useful, especially
for large databases, until pg_combinebackup works on tar (and compressed
tar). Right now restoring an incremental requires at least twice the
space of the original cluster, which is going to take a lot of users by
surprise.

I do agree it'd be nice if pg_combinebackup worked with .tar directly,
without having to extract the directories first. No argument there, but
as I said in the other thread, I believe that's something we can add
later. That's simply how incremental development works.

OK, sure, but if the plan is to make it practical later doesn't that make the feature something to be avoided now?

I know you have made some improvements here for COW filesystems, but my
experience is that Postgres is generally not run on such filesystems,
though that is changing a bit.

I'd say XFS is a pretty common choice, for example. And it's one of the
filesystems that work great with pg_combinebackup.

XFS has certainly advanced more than I was aware.

However, who says this has to be the filesystem the Postgres instance
runs on? Who in their right mind put backups on the same volume as the
instance anyway? At which point it can be a different filesystem, even
if it's not ideal for running the database.

My experience is these days backups are generally placed in object stores. Sure, people are still using NFS but admins rarely have much control over those volumes. They may or not be COW filesystems.

FWIW I think it's fine to tell users that to minimize the disk space
requirements, they should use a CoW filesystem and --copy-file-range.
The docs don't say that currently, that's true.

That would probably be a good addition to the docs.

All of this also depends on how people do the restore. With the CoW
stuff they can do a quick (and small) copy on the backup server, and
then copy the result to the actual instance. Or they can do restore on
the target directly (e.g. by mounting a r/o volume with backups), in
which case the CoW won't really help.

And again, this all requires a significant amount of setup and tooling. Obviously I believe good backup requires effort but doing this right gets very complicated due to the limitations of the tool.

But yeah, having to keep the backups as expanded directories is not
great, I'd love to have .tar. Not necessarily because of the disk space
(in my experience the compression in filesystems works quite well for
this purpose), but mostly because it's more compact and allows working
with backups as a single piece of data (e.g. it's much cleared what the
checksum of a single .tar is, compared to a directory).

But again, object stores are commonly used for backup these days and billing is based on data stored rather than any compression that can be done on the data. Of course, you'd want to store the compressed tars in the object store, but that does mean storing an expanded copy somewhere to do pg_combinebackup.

But if the argument is that all this can/will be fixed in the future, I guess the smart thing for users to do is wait a few releases for incremental backups to become a practical feature.

Regards,
-David


Reply via email to