Hi Robert, 2015-03-06 3:10 GMT+11:00 Robert Haas <robertmh...@gmail.com>:
> But I agree with Fujii to the extent that I see little value in > committing this patch in the form proposed. Being smart enough to use > the LSN to identify changed blocks, but then sending the entirety of > every file anyway because you don't want to go to the trouble of > figuring out how to revise the wire protocol to identify the > individual blocks being sent and write the tools to reconstruct a full > backup based on that data, does not seem like enough of a win. I believe the main point is to look at a user interface point of view. If/When we switch to a block level incremental support, this will be completely transparent to the end user, even if we start with a file-level approach with LSN check. The win is already determined by the average space/time gained by users of VLDB with a good chunk of read-only data. Our Barman users with incremental backup (released recently - its algorithm can be compared to the one of file-level backup proposed by Marco) can benefit on average of a data deduplication ratio ranging between 50 to 70% of the cluster size. A tangible example is depicted here, with Navionics saving 8.2TB a week thanks to this approach (and 17 hours instead of 50 for backup time): http://blog.2ndquadrant.com/incremental-backup-barman-1-4-0/ However, even smaller databases will benefit. It is clear that very small databases as well as frequently updated ones won't be interested in incremental backup, but that is never been the use case for this feature. I believe that if we still think that this approach is not worth it, we are making a big mistake. The way I see it, this patch follows an agile approach and it is an important step towards incremental backup on a block basis. > As Fujii says, if we ship this patch as written, people will just keep > using the timestamp-based approach anyway. I think that allowing users to be able to backup in an incremental way through streaming replication (even though based on files) will give more flexibility to system and database administrators for their disaster recovery solutions. Thanks, Gabriele