On 22.04.2019 2:02, Robert Haas wrote:
On Sat, Apr 20, 2019 at 4:32 PM Stephen Frost <sfr...@snowman.net> wrote:
Having been around for a while working on backup-related things, if I
was to implement the protocol for pg_basebackup today, I'd definitely
implement "give me a list" and "give me this file" rather than the
tar-based approach, because I've learned that people want to be
able to do parallel backups and that's a decent way to do that.  I
wouldn't set out and implement something new that's there's just no hope
of making parallel.  Maybe the first write of pg_basebackup would still
be simple and serial since it's certainly more work to make a frontend
tool like that work in parallel, but at least the protocol would be
ready to support a parallel option being added alter without being
rewritten.

And that's really what I was trying to get at here- if we've got the
choice now to decide what this is going to look like from a protocol
level, it'd be great if we could make it able to support being used in a
parallel fashion, even if pg_basebackup is still single-threaded.
I think we're getting closer to a meeting of the minds here, but I
don't think it's intrinsically necessary to rewrite the whole method
of operation of pg_basebackup to implement incremental backup in a
sensible way.  One could instead just do a straightforward extension
to the existing BASE_BACKUP command to enable incremental backup.
Then, to enable parallel full backup and all sorts of out-of-core
hacking, one could expand the command language to allow tools to
access individual steps: START_BACKUP, SEND_FILE_LIST,
SEND_FILE_CONTENTS, STOP_BACKUP, or whatever.  The second thing makes
for an appealing project, but I do not think there is a technical
reason why it has to be done first.  Or for that matter why it has to
be done second.  As I keep saying, incremental backup and full backup
are separate projects and I believe it's completely reasonable for
whoever is doing the work to decide on the order in which they would
like to do the work.

Having said that, I'm curious what people other than Stephen (and
other pgbackrest hackers) think about the relative value of parallel
backup vs. incremental backup.  Stephen appears quite convinced that
parallel backup is full of win and incremental backup is a bit of a
yawn by comparison, and while I certainly would not want to discount
the value of his experience in this area, it sometimes happens on this
mailing list that [ drum roll please ] not everybody agrees about
everything.  So, what do other people think?


Based on the experience of pg_probackup users I can say that  there is no 100% winer and depending on use case either
parallel either incremental backups are preferable.
- If size of database is not so larger and intensity of updates is high enough, then parallel backup within one data center is definitely more efficient solution. - If size of database is very large and data is rarely updated or database is mostly append-only, then incremental backup is preferable. - Some customers need to collect at central server backups of databases installed at many nodes with slow and unreliable connection (assume DBMS installed at locomotives). Definitely parallelism can not help here, unlike support of incremental backup. - Parallel backup more aggressively consumes resources of the system, interfering with normal work of application. So performing parallel backup may cause significant degradation of application speed.

pg_probackup supports both features: parallel and incremental backups and it is up to user how to use it in more efficient way for particular configuration.



--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Reply via email to